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Abstract 

This  paper  considers  issues  related  to  multiple  structural  changes,  occurring  at  un- 
known dates,  in  the  linear  regression  model  estimated  by  least  squares.  The  main 
aspects  are  the  properties  of  the  estimators,  including  the  estimates  of  the  break 
dates,  and  the  construction  of  tests  that  allow  inference  to  be  made  about  the  pres- 
ence of  structural  change  and  the  number  of  breaks.  We  consider  the  general  case  of 
a  partial  structural  change  model  where  not  all  parameters  are  subject  to  shifts.  We 
show  convergence  at  rate  T  of  the  estimates  of  the  break  fractions.  We  also  discuss 
a  procedure  that  allows  one  to  test  the  null  hypothesis  of,  say,  £  changes,  versus  the 
alternative  hypothesis  of  t  +  1  changes.  This  is  particularly  useful  in  that  it  allows  a 
specific  to  general  modeling  strategy  to  consistently  determine  the  appropriate  num- 
ber of  changes  present.  An  estimation  strategy  for  which  the  location  of  the  breaks 
need  not  be  simultaneously  determined  is  discussed.  Instead,  our  method  successively 
estimates  each  break  point.  Empirical  applications  are  presented  to  illustrate  the  use- 
fulness of  the  various  procedures. 


Keywords:  Asymptotic  Distribution,  Change  point,  Rate  of  convergence,  Model  se- 
lection, Dynamic  programming. 
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1      Introduction. 

This  paper  considers  issues  related  to  multiple  structural  changes  in  the  linear  regres- 
sion model  estimated  by  minimizing  the  sum  of  squared  residuals.  Throughout,  we 
treat  the  dates  of  the  breaks  as  unknown  variables  to  be  estimated.  The  main  aspects 
considered  are  the  properties  of  the  estimators,  including  the  estimates  of  the  break 
dates,  and  the  construction  of  tests  that  allow  inference  to  be  made  about  the  presence 
of  structural  change  and  the  number  of  breaks.  To  that  effect  we  discuss  tests  of  the 
null  hypothesis  of  no  structural  change  versus  an  arbitrary  number  of  changes  as  well 
as  tests  of  the  null  hypothesis  of,  say,  I  versus  £+1  changes. 

Both  the  statistics  and  econometrics  literature  contains  a  vast  amount  of  work 
on  issues  related  to  structural  change,  most  of  it  specifically  designed  for  the  case 
of  a  single  change2.  The  econometric  literature  has  witnessed  recently  an  upsurge  of 
interest  in  extending  procedures  to  various  models  with  an  unknown  change  point, 
thereby  offering  serious  alternatives  to  the  CUSUM  test  of  Brown,  Durbin  and  Evans 
(1975). 

With  respect  to  the  problem  of  testing  for  structural  change,  recent  contributions 
include  the  comprehensive  treatment  of  Andrews  (1993)  who  considers  sup  Wald,  Like- 
lihood Ratio  and  Lagrange  Multiplier  tests.  Weighted  versions  of  these  tests  satisfying 
some  asymptotic  optimality  criterion  are  discussed  in  Andrews  and  Ploberger  (1992). 
Recent  studies  also  consider  econometric  models  with  trending  regressors,  unit  root, 
cointegrated  variables  and  serial  correlation3.  Methods  allowing  the  investigator  to  be 
agnostic  about  the  presence  or  absence  of  integrated  variables  are  presented  in  Perron 
(1991)  and  Vogelsang  (1993).  The  issue  of  structural  change  has  also  received  a  lot 
of  attention  in  the  recent  debate  on  unit  root  versus  structural  change  in  the  trend 
function  of  a  univariate  time  series4. Yet,  all  these  recent  developments  consider  only 
the  case  of  a  single  structural  change. 

Issues  about  the  distributional  properties  of  the  parameter  estimates,  in  partic- 
ular those  of  the  break  dates,  have  received  somewhat  less  attention  despite  their 
importance.    The  work  of  Bai  (1994,1995a)  contains  general  results  concerning  the 


2For  a  survey,  see  Knshnaiah  and  Miao  (1988)  and  Zacks  (1983)  as  well  as  the  comprehensive 
treatment  of  Deshayes  and  Picard  (1986). 

3See,  among  others,  Christiano  (1992),  Chu  and  White  (1992),  Kim  and  Sigmund  (1989)  and 
Perron  (1991)  (trending  regressors),  Kramer,  Ploberger  and  Alt  (1988)  (serial  correlation),  Bai, 
Lumsdaine  and  Stock  (1994)  and  Hansen  (1990,  1992)  (models  with  integrated  variables). 

4See  Perron  (1989, 1993, 1994),  Banerjee,  Lumsdaine  and  Stock  (1992),  Zivot  and  Andrews  (1992), 
Perron  and  Vogelsang  (1992)  and  Gregory  and  Hansen  (1993). 


asymptotic  distribution  of  the  estimated  break  date  when  a  single  break  occurs,  in 
particular  the  fact  that  the  estimated  break  fraction  converges  to  its  true  value  at  rate 
T. 

In  comparison,  the  literature  addressing  the  issue  of  multiple  structural  changes 
is  relatively  sparse.  Recent  developments  include  Andrews,  Lee  and  Ploberger  (1992) 
who  consider  optimal  tests  in  the  linear  model  with  known  variance.  Garcia  and 
Perron  (1994)  study  the  sup  Wald  test  for  two  changes  in  a  dynamic  time  series5.  To 
our  knowledge,  the  most  comprehensive  treatment  is  that  of  Liu,  Wu  and  Zidek  (1994) 
who  consider,  as  we  do,  multiple  shifts  in  a  linear  model  estimated  by  least  squares 
(in  the  context  of  a  more  general  multiple  thresholds  model).  They  study  the  rate 
of  convergence  of  the  estimated  break  dates,  as  well  as  the  consistency  of  a  modified 
Schwarz  model  selection  criterion  to  determine  the  number  of  breaks.  Their  analysis 
considers  only  the  so-called  pure-structural  change  case  where  all  the  parameters  are 
subject  to  shifts. 

Our  assumptions  are  less  restrictive  than  those  of  Liu,  Wu  and  Zidek  (1994). 
Furthermore,  we  consider  the  more  general  case  of  a  partial  structural  change  where 
not  all  parameters  are  subject  to  shifts.  Concerning  the  asymptotic  behavior  of  the 
estimates  of  the  break  dates,  we  improve  on  the  rate  they  report,  T/ln2(T),  by  showing 
convergence  at  rate  T.  We  also  consider  the  asymptotic  distribution  and  confidence 
intervals  of  the  estimates  of  the  break  dates. 

Our  study  considers,  in  addition,  the  important  problem  of  testing  for  multiple 
structural  changes.  To  that  effect  we  present  sup  Wald  type  tests  for  the  null  hypoth- 
esis of  no  change  versus  an  alternative  hypothesis  containing  an  arbitrary  number  of 
changes.  We  also  discuss  procedures  that  allow  one  to  test  the  null  hypothesis  of,  say, 
£  changes,  versus  the  alternative  hypothesis  of  ^+1  changes.  This  is  particularly  useful 
in  that  it  allows  a  specific  to  general  modeling  strategy  to  consistently  determine  the 
appropriate  number  of  changes  present  (thereby  avoiding  the  use  of  a  model  selection 
criterion  which  requires  the  estimation  of  the  model  for  all  possible  number  of  breaks 
up  to  some  a  priori  specified  maximum).  Finally  our  paper  contains  a  discussion  of  an 


5Some  contributions  include  Fu  and  Curnow  (1990)  who  discuss  maximum  likelihood  estimation 
of  multiple  shifts  in  a  somewhat  restrictive  binomial  model.  Yao  (1988)  considers  estimating  the 
number  of  breaks  in  the  mean  of  a  sequence  of  normal  random  variables  based  on  the  BIC  criterion. 
Yao  and  Au  (1988)  treat  the  estimation  of  multiple  mean  breaks  in  a  sequence  of  random  variables 
and  consider  estimating  the  number  of  breaks  using  the  BIC  criterion.  Yin  (1988)  uses  the  moving- 
window  nonparametric  technique  to  estimate  the  breaks  in  a  sequence  of  random  variables.  Also, 
Feder  (1975)  considers  estimating  the  joint  points  of  polynomial  type  segmented  regressions  (non- 
discrete  shifts).  Other  relevant  contributions  include  Kim  (1993)  and  Lumsdaine  and  Papell  (1995). 


estimation  strategy  for  which  the  location  of  the  breaks  need  not  be  simultaneously 
determined.  Rather  our  method  successively  estimates  each  break  point. 

There  are  many  practical  advantages  arising  from  the  estimation  and  inference  of 
models  with  structural  changes.  To  mention  a  few,  we  first  note  that  it  allows  the 
identification  of  events  that  may  have  fostered  the  structural  changes.  For  example, 
an  approach  often  used  to  examine  the  effectiveness  of  policy  changes  involves  dummy 
variable  regressions  and  inference  on  the  corresponding  regression  coefficient.  An 
alternative  is  to  compare  the  estimated  break  date  with  the  effective  date  of  a  policy 
change  (or  policy  implementation).  Another  potentially  useful  aspect  is  in  the  field 
of  forecasting.  Indeed,  if  many  regimes  are  present  in  a  given  sample,  using  the  most 
recent  regime  may  lead  to  better  forecasts. 

The  rest  of  this  paper  is  structured  as  follows.  Section  2  discusses  the  model  and 
the  assumptions  imposed  on  the  variables  and  the  innovations.  Section  3  contains 
results  pertaining  to  the  consistency,  the  rate  of  convergence  and  the  asymptotic 
distribution  of  the  estimates  of  the  break  dates  (as  well  as  the  others  parameters  of  the 
model).  Section  4  proposes  test  statistics,  derives  their  asymptotic  distributions  and 
presents  critical  values.  Section  5  discusses  sequential  methods  used  to  estimate  the 
model  without  treating  all  break  points  simultaneously.  Section  6  presents  empirical 
applications.  Appendix  A  contains  some  mathematical  derivations  and  Appendix  B 
a  description  of  a  procedure  to  obtain  estimates  with  multiple  changes  based  on  the 
principle  of  dynamic  programing  . 

2      The  Model  and  Assumptions. 

Consider  the  following  multiple  linear  regression  with  m  breaks  (m  +  1  regimes): 

yt  =  x'tP  +  z't61  +  ut,        i=l,2,...,Ti, 

Vt  =  x'tP  +  z'th  +  ««,        <  =  Ti  +  l,...,T2, 

yt  =  x'tj3  +  zt'<5TO+1  +  ut,    t  =  Tm  +  1, ...,  T, 

where  yt  is  the  observed  dependent  variable  at  time  t;  xt  (p  x  1)  and  zt  (q  x  1)  are 
vectors  of  covariates  and  /?  and  6j  (j  =  1,  ...,m+l)  are  the  corresponding  vectors 
of  coefficients;  ut  is  the  disturbance  at  time  t.  The  indices  (7\,  ...,Tm),  or  the  break 
points,  are  explicitly  treated  as  unknown.  The  purpose  is  to  estimate  the  unknown 
regression  coefficients  together  with  the  break  points  when  T  observations  on  (yt,  zt,  zt) 
are  available. 


Note  that  this  is  a  partial  structural  change  model  in  the  sense  that  the  parameter 
vector  /?  is  not  subject  to  shifts  and  is  effectively  estimated  using  the  entire  sample. 
When  p  =  0,  we  obtain  a  pure  structural  change  model  where  all  the  coefficients  are 
subject  to  change.  A  partial  structural  change  model  is  therefore  more  general  and 
includes  the  latter  as  a  special  case. 

To  proceed,  it  is  convenient  to  introduce  some  terminologies.  First,  we  call  an  m- 
partition  (or  simply  a  partition)  of  the  integers  (1, ...,  T),  an  m-tuple  vector  of  integers 
(7i, ...,  Tm)  such  that  1  <  7\  <  •  •  •  Tm  <  T.  Note  also  that,  throughout,  we  shall  use 
the  convention  that  T0  =  0  and  Tm+i  =  T.  Second,  define  the  block-diagonal  matrix 


Z  = 


(  Zi  \ 


\  zm+1  J 


with  Zi  =  (zTi^+u—fZTi)'-  The  matrix  Z  is  said  to  diagonally  partition  Z  = 
(zi,  ...,zt)'  at  (7i,...,Tm).  Using  these  definitions,  the  multiple  linear  regression  sys- 
tem (1)  may  be  expressed  in  matrix  form  as 

Y  =  X/3  +  Z8  +  U, 

where  Y  =  (yx,...,yT)',X  =  (xu...,xT)',  U  =  (u1,...,uT)',  <5  =  {8[,8'2,  •••,<^l+1)/,  and 
Z  is  the  matrix  which  diagonally  partitions  Z  at  (Ti,  ...,Tm). 

Throughout,  we  denote  the  true  value  of  a  parameter  with  a  0  superscript  or 
subscript.  In  particular,  £°  =  (8°  ,  ...,(^+1)'  and  (T®,  ...,J^)  are  used  to  denote, 
respectively,  the  true  values  of  the  parameters  8  and  the  true  break  points.  The 
matrix  Z0  is  the  one  which  diagonally  partitions  Z  at  (T°,  ...,T^).  Hence,  the  data- 
generating  process  is  assumed  to  be 

Y  =  Xj3°  +  Z08°  +  U. 

The  goal  is  to  estimate  the  unknown  coefficients  (0°,8°,  ...,8^+1,T°,  ...,T^),  as- 
suming 8f  7^  8f+1  (1  <  %  <  m).  In  general,  the  number  of  breaks  m  can  be  treated  as 
an  unknown  variable  with  true  value  m°.  However,  for  now,  we  treat  it  as  known  and 
discuss  methods  of  estimating  it  in  later  sections.  We  also  postpone  the  problem  of 
testing  for  the  presence  of  structural  change  to  Section  4. 

The  method  of  estimation  considered  is  that  based  on  the  least-squares  principle. 
For  each  m-partition  (Ti,...,Tm),  the  associated  least-squares  estimates  of  f3  and  8j 


are  obtained  by  minimizing  the  sum  of  squared  residuals 

m+l         T, 

(Y-XP-  Z6)'(Y  -XP-Z6)=Y:     £    bit  ~  xtf  ~  z'M2- 

t=l  t=T,_i+l 

Let  (3({Tj})  and  S({Tj})  denote  the  resulting  estimates  based  on  the  given  m-partition 
(7i,...,rm)  denoted  {Tj}.  Substituting  these  estimates  in  the  objective  function  and 
denoting  the  resulting  sum  of  squared  residuals  as  St{Ti, ...,  Tm),  the  estimated  break 
points  (Ti,...,rm)  are  such  that 

(2)  (fi, ...,  fm)  =  argminTl TmST{Ti, ...,  Tm). 

where  the  minimization  is  taken  over  all  partitions  (2i,...,Tm)  such  that  T,  —  T,_i  > 
q.  Thus  the  break-point  estimators  are  global  minimizers  of  the  objective  function. 
Finally,  the  regression  parameter  estimates  are  obtained  using  the  associated  least- 
squares  estimates  at  the  estimated  m-partition  {Tj},  i.e. 

(3)  P  =  fc{fj}),    S  =  6{{fj}). 

Since,  the  break  points  are  discrete  parameters  and  can  only  take  a  finite  number  of 
values,  they  can  be  estimated  by  grid  search.  In  the  case  of  a  pure  structural  change 
model,  an  efficient  procedure  to  obtain  global  minimizers  can  be  obtained  using  a 
dynamic  programming  approach.  This  allows  the  estimates  to  be  calculated  using  a 
number  of  sums  of  squared  residuals  (corresponding  to  the  different  possible  partitions) 
that  is  of  order  0(T2)  for  any  m  >  2.  The  calculations  needed  can  be  significantly 
reduced  further  using  standard  updating  formulae  for  recursive  residuals.  In  effect, 
the  procedure  amounts  to  computing  T  sets  of  recursive  residuals  and  performing 
pairwise  comparisons  of  the  associated  sum  of  squared  residuals.  The  method  can 
easily  be  extended,  in  an  iterative  fashion,  to  cover  the  case  of  a  partial  structural 
change  model.  These  issues  are  discussed  in  detail  in  Appendix  B6. 

The  statistical  properties  of  the  resulting  estimators  are  studied  in  the  next  section 
under  the  following  set  of  assumptions. 

Al.  Let  wt  =  (x't,z't)',  W  =  (lOl,...,u^^),  and  W  be  the  diagonal  partition  of  W 
at  the  true  break  points  (7?, ...,  7^)  such  that  W°  =  diag(W?' , ...,  W%+1).  We  assume 
for  each  i  =  1, ...,  m  +  l,  that  W°Wf/(Tf  —  Tf^)  converges  to  a  non-random  positive 


6  A  GAUSS  program  that  calculates  global  minimizers  using  this  dynamic  programing  approach 
is  available  from  the  authors  upon  request.  This  program  also  has  procedures  to  compute  the  other 
tests  and  statistics  discussed  in  this  paper. 


definite  matrix  (with  T0°  =  1  and  T£+1  =  T).  The  limiting  matrices  need  not  be  the 
same  for  all  i. 

A2.  For  large  £,  the  minimum  eigenvalues  of  \  T.^o+x  "W  ancl  °f  i  £t°-*  u^  are 
bounded  away  from  zero  (i  =  1,  ...,m  +  1). 

A3.  The  matrix  Au  =  Yli  ztz't  1S  invertible  for  £  —  k  >  q,  the  dimension  of  zt. 

A4.  The  sequence  of  errors  {ut}  satisfies  either  of  the  following  two  sets  of  condi- 
tions: 

i)  Let  {Fi  :  i  =  •  •  • ,  —2,  —1,0, 1,2,  •  •  •}  be  a  sequence  of  increasing  a-fields.  Assume 
that  {u{,  Ti]  forms  a  Lr-mixingale  sequence  with  r  =  4  +  6  for  some  8  >  0  [McLeish 
(1975)  and  Andrews  (1988)].  That  is,  there  exist  nonnegative  constants  {c,  :  z  >  1} 
and  {i/>m  :  m  >  0}  such  that  ^m  J.  0  as  m  — ►  co  and  for  alH  >  0  and  m  >  0,  we  have 

(a)  \\E{ui\^m)\\T  <  *il>m, 

(b)  \\m  -  E(Ui\Fi+m)\\r  <  C.V'm+l, 

where  ||X||r  =  (E\X\t)1/t.  We  assume  in  addition  that 

(c)  max,-  C{  <  K  <  oo, 

(d)  E~=-oo  i>m  <  OO, 

t 

and 

(e)  The  disturbance  ut  is  independent  of  the  regressors  {zs,xs}  for  all  t  and  s. 

ii)  Let  Tl  =  cr-iie\d{...,wt-i,Wt,  ...,ut_2,  ut-\}. 

a)  We  assume  that  {ut}  is  a  martingale  difference  sequence  relative  to  {F*}  satis- 
fying £(u4|^"t*_i)  =  0,  and  supt  E\ut\4+S  <  oo. 

b)  We  have 

_   PV] 
plimT'"1^^  =  Q(v), 
t=i 

uniformly  in  v  €  [0, 1],  where  Q(v)  is  positive  definite  for  v  >  0  and  strictly  increasing 

in  u  (i.e.  Q{v)  —  Q(u)  is  positive  definite  for  v  >  u). 

c)  If  the  disturbances  ut  are  not  independent  of  the  regressors  {zs}  for  all  t  and 
s,  the  minimization  problem  defined  by  (2)  is  taken  over  all  possible  partitions  such 
that  T{  —  T,_i  >  eT  (i  =  l,...,m-f  1)  for  some  e  >  0  (note  that  this  is  not  required 
under  part  (i)). 

A5.  T?  =  [TX%  where  0  <  A?  <  •  •  •  <  A^  <  1. 

Assumption  Al  is  standard  for  multiple  linear  regressions.  Assumption  A2  requires 
that  there  be  enough  observations  near  the  true  break  points  so  that  they  can  be 
identified.    Now  consider  A3.    Because  the  break  points  are  estimated  by  a  global 


least-squares  search,  we  require  the  sum  Y?k  ztz\  to  be  invertible  for  £  —  k  >  q.  In 
particulax,  no  segment  should  contain  fewer  observations  than  9,  as  an  exact  fit  is 
otherwise  obtained.  If  we  impose  the  number  of  observations  in  each  segment  to  be  at 
least  some  fixed  number  h  (h  >  q,  not  depending  on  T),  the  invertibility  requirement 
in  A3  can  be  weakened  to  hold  for  all  combinations  (£,  k)  for  which  £  —  k  >  h.  Note 
that  A3  is  actually  for  technical  convenience  and  could  be  dispensed  with.  This  would 
require  the  use  of  generalized  inverses.  We  assume,  for  simplicity,  the  existence  of  the 
inverse  of  Am,  but  the  proof  goes  through  with  generalized  inverses  at  the  expense  of 
a  greater  technical  burden. 

The  assumptions  stated  in  A4  pertain  to  two  specific  cases  related  to  the  presence 
or  absence  of  a  lagged  dependent  variable  in  zt.  The  conditions  described  in  part  (i) 
pertain  to  the  case  where  no  lagged  dependent  variables  are  allowed  in  zt.  In  this  case, 
the  conditions  on  the  residuals  are  quite  general  and  allow  substantial  correlation.  A 
mixingale  sequence  includes  many  processes  as  special  cases,  such  as  martingale  dif- 
ferences, strong  mixing  processes,  linear  processes,  and  functions  of  mixing  processes, 
see  Andrews  (1988)  for  details.  The  sequence  {ut}  need  not  be  stationary  but  the  ex- 
istence of  a  uniformly  bounded  moment  of  order  4  +  8  is  required  (this  can  be  seen  by 
noting  that  condition  (c)  is,  in  most  cases,  equivalent  to  sup,-  ||u,||r  <  00).  Condition 
(e)  precludes  the  presence  of  lagged  dependent  variables  in  the  regressors. 

Part  (ii)  of  Assumption  A4  considers  the  case  where  lagged  dependent  variables 
are  allowed  as  regressors.  In  this  case,  no  serial  correlation  is  permitted  in  the  errors 
{ut}.  The  requirement  of  {ztut}  forming  a  martingale  difference  is  to  permit  weak 
convergence  of  the  partial  sums  T-1/2  Ht=rrui+i  ztut.  This  extra  generality  is  obtained 
at  the  expense  of  some  restrictions  on  the  admissible  partitions.  If  a  lagged  dependent 
varaible  is  present  in  the  zt,  each  segment  considered  must  contain  a  positive  fraction 
of  the  total  sample.  This  is  not  constraining  from  a  practical  point  of  view  since  t  can 
be  arbitrarily  small.  Note  that  no  such  restriction  is  necessary  if  a  lagged  dependent 
variable  is  present  in  the  it's.  In  both  cases,  the  assumptions  are  general  enough  to 
allow  different  distributions  for  both  the  regressors  and  the  errors  in  each  segment. 

The  possibility  of  lagged  dependent  variables  is  potentially  quite  useful  if  the  pa- 
rameters associated  with  the  dynamics  of  the  dependent  variables  are  not  subject 
to  structural  change.  In  this  case,  the  investigator  can  take  these  dynamic  effects 
into  account  either  in  a  direct  parametric  fashion  (e.g.  introducing  lagged  dependent 
variables  so  as  to  have  uncorrelated  residuals)  or  using  an  indirect  nonparametric  ap- 
proach (e.g.  leaving  the  dynamics  in  the  disturbances  and  applying  a  nonparametric 


correction  for  proper  asymptotic  inference).  This  trade-off  can  be  useful  to  distinguish 
gradual  from  sudden  changes  the  same  way  a  distinction  is  made  between  innovational 
and  additive  outliers. 

Assumption  A5  is  a  standard  requirement  to  permit  the  development  of  an  asymp- 
totic theory  and  allows  the  break  points  to  be  asymptotically  distinct.  It  essentially 
requires  the  asymptotic  experiments  to  be  carried  under  the  assumption  that  each  seg- 
ments increase  proportionately  as  the  sample  size  increases.  We  refer  to  the  quantities 
(A°, ...,  A£J  as  the  break  fractions  and  we  let  A°  =  0  and  A^+1  =  1. 

3      Consistency  and  Limiting  Distributions. 

In  this  section,  we  are  interested  in  the  consistency  property  of  the  estimated  break 
fractions  and  especially  the  rate  of  convergence.  The  result  about  the  rate  of  con- 
vergence will  allow  us  to  derive  results  about  the  asymptotic  distribution  of  the  es- 
timates of  the  break  dates  as  well  as  the  estimated  regression  coefficients.  We  let 
A  =  (Al7 ...,  Am)  with  corresponding  true  values  A0  =  (A°, ...,  A^J.  We  shall  first  show 
that  A  is  consistent  for  A0  and  later  that  the  rate  of  convergence  is  T.  As  a  matter 
of  notation,  we  let  "— ►"  denote  convergence  in  probability,  "— ►"  convergence  in  dis- 
tribution, and  "=*>"  weak  convergence  in  the  space  D[0, 1]  under  the  uniform  metric, 
see  Pollard  (1984,  Chapter  5). 

3.1      Consistency. 

The  main  result  of  this  section  is  summarized  in  the  following  proposition  which  states 
the  consistency  of  A  for  A0. 

Proposition  1  Under  assumptions  A1-A5,  the  estimated  break  fractions  are  consis- 
tent.   That  is,  for  each  n  >  0  and  each  e  >  0,  we  have,  when  T  is  large: 

P(|Afc  -  A2|  >  i?)  <  e,       (*  =  l,...,m). 

We  outline  the  main  steps  of  the  proof  using  a  few  lemmas  that  are  proved  in  the 
appendix.  Note  that  T,  need  not  equal  Tf  so  that  estimated  segments  (or  regimes) 
need  not  correspond  to  true  regimes  and  the  two  partitions  {T,}  and  {T°}  are  allowed 
to  overlap.  By  showing  that  Xj  — »  X°  we,  in  effect,  bound  the  degree  of  overlap. 


Denote  by  lit  the  estimated  residual  for  the  r-th  observation  and  by  dt  the  difference 
between  the  fitted  regression  "line"  and  the  true  regression  line.  That  is, 

ut  =  yt-  xtJ3  -  zt6k,         t  e  [7jt-i  +  1,  fk] 
and 

dt  =  x't0  -  $°)  +  z't(h  -  «9),      t  €  [7U  + 1,  f)t]  n  pj.,  + 1,  r°] 

for  A;,  j  =  l,...,m  +  1.  Note  that,  in  general,  dt  is  defined  over  (m  +  l)2  different 
segments  corresponding  to  each  of  the  possible  m-partitions  {T,}  and  {Tt0}.  Using 
elementary  properties  of  projections, 

(4)  £x;fi?<ii>?, 

-*  t=i     j  t=i 

and  using  ut  =  ut  —  dt, 

(5)  ^E^=^E«*+ii:^-4^u^- 

j  t=i    ±  t=i    j  «=i     j  t=\ 

The  proof  of  Proposition  1  simply  uses  relations  (4)  and  (5)  and  the  associated  limits 
of  T~l  J2t=i  <%  and  T~l  J2t=i  utdt.  We  start  with  the  latter. 

Lemma  1    Under  assumptions  A1-A5,  we  have 

1    T 

-J2utdt  =  op{l). 

1  t=i 

In  the  absence  of  structural  changes  (6j  =  0  for  all  j),  T~lYlutdt  =  T~l(J3  — 
{3)'J2xtut,  which  is  readily  seen  to  be  Op{T~ll2).  In  this  case,  Lemma  1  holds  trivially. 
The  proof  of  this  lemma,  presented  in  the  appendix,  is  quite  involved  when  breaks 
occurring  at  unknown  dates  exist.  Lemma  1  allows  us  to  state  directly  the  following 
result  applicable  in  the  case  of  homogeneous  disturbances. 

Corollary  1  If  E{u\)  =  a2  for  all  t,  then  under  Assumptions  A1-A5,  we  have  as 

T  — >  oo: 


1    T 


-t    i 


2  p-  a2. 


Proof:  Inequality  (4)  implies  that  a2  <  a2  +  op(l)  while  (5)  implies,  by  Lemma  1, 
a2  =  cr2  +  £<f?/T  +  op(l)>cr2  +  0p(l).  D 

Lemma  1  together  with  (4)  and  (5)  implies  that,  under  A1-A5 

(6)  ^xx^o. 

1  t=i 

The  proof  of  the  consistency  of  A  for  A0  follows  by  showing  that  (6)  implies  A  -^ 
A0.  More  specifically,  we  prove  in  the  appendix  that  (6)  cannot  hold  if  A,  -/+p  A°  for 
some  i.  This  is  stated  in  the  following  lemma. 

Lemma  2  Suppose  that  assumptions  A1-A5  are  satisfied  and  that  some  break  date, 
say  A°,  cannot  be  consistently  estimated,  then 


TlimsuPp(i|:^>CK-6?+1||2) 


for  some  constant  C  >  0  and  some  to  >  0. 

We  are  now  in  the  position  to  prove  Proposition  1.  Using  (5)  and  Lemmas  1  and 
2,  and  under  the  supposition  that  some  break  date  is  not  consistently  estimated,  we 
have  the  inequality 

rp  rp 

1  i        i  i 

which  holds  with  probability  no  less  than  some  eo  >  0.  This  is  in  contradiction  with 
the  inequality  (4),  which  holds  with  probability  1  for  all  T.  Hence,  all  break  dates  are 
consistently  estimated. 

The  consistency  result  for  A  allows  us  to  state  consistency  results  for  the  parameter 
estimates  0  and  6k  (k  =  l,...,m).  The  proof  of  the  following  proposition,  presented 
in  the  appendix,  uses  arguments  similar  to  those  involved  in  the  proof  of  Lemma  2. 

Proposition  2    Under  assumptions  A1-A5,  we  have 

m  he0  ^° 

y  }  Sk-S°k    -^0        (Jfc=l,...,m). 

That  is,  the  least  squares  estimators  of  the  regression  coefficients  are  consistent. 
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3.2      Rates  of  convergence. 

We  now  consider  the  rate  of  convergence  of  the  estimates.  We  start  by  showing  that 
Xk  converge  to  its  true  value  at  rate  T.  More  precisely,  we  have: 

Proposition  3   Under  assumptions  A1-A5,  for  every  77  >  0,  there  exists  a  C  <  00, 
such  that  for  all  large  T, 

P(\T(Xk-\°k)\>C)<r,,        (*  =  l,...,m). 

The  proof  is  presented  in  the  appendix.  It  is  important  to  remark  that  the  rate  T 
convergence  pertains  to  the  estimated  break  fractions  A,  and  not  to  T,,  the  estimates 
of  the  break  dates.  For  the  latter,  our  result  states  that  with  high  probability  the  bias 
is  bounded  by  some  constant  C  that  is  independent  of  the  sample  size  T,  i.e.  with 
high  probability,  we  have  \T{  —  If\  <  C. 

This  result  about  the  rate  of  convergence  of  the  break  fractions  allows  us  to  derive 
the  rate  of  convergence  and  obtain  the  asymptotic  distribution  of  the  estimated  co- 
efficients $  and  8.  The  relevant  results  are  stated  in  the  following  proposition  whose 
proof  is  similar  to  Corollary  1  of  Bai  (1994b)  and  is  therefore  omitted. 

Proposition  4   Under  assumptions  A1-A5,   the  estimates  (3  and  6  in  (3)  are  vT 
consistent  and  asymptotically  normal  such  that 

where 

(9)  $  =  pKm^(X,Zo)'n(X,Z0), 

and 

n  =  e{uu'). 

Note  that  when  the  errors  are  serially  uncorrelated  and  homoskedastic  we  have 
$  =  cr2V  and  the  asymptotic  covariance  matrix  reduces  to  a2V~l ,  which  can  be  con- 
sistently estimated  using  a  consistent  estimate  of  a2.  When  serial  correlation  and/or 
heteroskedasticity  is  present,  a  consistent  estimate  of  $  can  be  constructed  along  the 
lines  of  Newey  and  West  (1987)  and  Andrews  (1991).  Note  that  the  correction  for  pos- 
sible serial  correlation  can  be  made  assuming  identical  distributions  across  segments 
or  allowing  the  distributions  of  both  the  regressors  and  the  errors  to  differ. 
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3.3      Limiting  Distributions  of  Break  Dates 

While  the  consistency  of  the  estimates  of  the  break  fractions  depends  on  the  global 
behavior  of  the  objective  function,  their  limiting  distribution  depends  on  the  local 
behavior  around  the  true  break  dates.  For  this  reason,  the  technical  apparatus  required 
to  prove  consistency  is  rather  different  in  the  multiple  breaks  case  compared  to  the 
single  break  case.  However,  the  analysis  of  the  limiting  distribution  is  similar  in  both 
cases.  In  the  single  break  case,  results  concerning  the  limiting  distribution  of  the 
break  dates  are  discussed  in  Bai  (1994b).  We  provide,  in  this  section,  the  appropriate 
extensions  to  the  case  of  multiple  changes.  Since  the  methods  are  similar,  we  only 
discuss  the  relevant  results  and  refer  the  reader  to  Bai  (1994b)  for  more  details. 

We  discuss  two  types  of  limiting  distributions  for  the  estimates  of  the  break  points. 
The  first  corresponds  to  shifts  of  fixed  magnitude  and  the  second  to  shifts  of  shrinking 
magnitudes  as  the  sample  size  increases. 

3.3.1      Limiting  Distributions  with  Fixed  Shifts. 

To  study  the  limiting  distribution  in  the  case  of  fixed  shifts,  we  need  the  following 
stronger  assumptions. 

A6.  For  regime  i,  {zt;T°_1  +  1  <  t  <  T°}  is  a  strictly  stationary  process  (i  = 
l,...,m  +  l). 

We  say  that  zt  follows  process  i  when  the  time  index  t  belongs  to  the  ith  regime. 
Note  that  for  different  regimes,  {zt,ut}  may  follow  different  stochastic  processes  and, 
hence,  piecewise  stationarity  is  allowed.  Also,  no  stationarity  assumption  need  be  im- 
posed on  the  if's.  To  characterize  the  limiting  distribution,  we  first  define  a  stochastic 
process  W^(k)  on  the  set  of  integers  as  follows:  W^(0)  =  0,  W^(k)  =  W^(0(A:)  for 
k  <  0,  and  W«(Jb)  =  WJ'^Jfe)  for  k  >  0  where,  for  t  =  1,  ...,m: 

(io)      w}>\k)  =  -a;.  £  z<'Vi'>A,-  +  2a;.  £  zM\    *  =  -1,  -2, ... 

t=k+l  t=k+l 

(ii)     wjp{k)  =  -A;£*f+1Vj,"+1,A.-  -2a;.£z!'+1)u!'+1),    *  =  1,2,... 

t=i  t=i 

with  A,  =  (£f+1  —  Sf)  and  where  {z\  ,  u\ *'}  follows  the  ith  stochastic  process  for  all  t. 
For  example,  when  (z\  ,u\1')  is  independent  over  time,  the  process  W^  is  a  two-sided 
random  walk  with  (stochastic)  drift. 
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Proposition  5    Under  assumptions  A1-A6,  and  assuming  that  [A^*]2  ±  A'^tUt  has  a 
continuous  distribution  (for  all  t),  then 

fi-Tf  -^argmaxH^W(jfc)        (i  =  l,...,m). 

Furthermore,  the  estimated  break  points  are  asymptotically  independent  of  each  other 
and  of  the  estimated  regression  parameters. 

The  assumption  that  {[AJz^]2  ±  A|ztut}  has  a  continuous  distribution  ensures  the 
uniqueness  of  the  maximum  of  W^'  (with  probability  1).  Note  that  the  limiting 
distributions  of  the  break  points  estimates  in  the  multiple  break  model  are  the  same 
as  in  a  single  break  model.  This  is  basically  due  to  the  fact  that  these  limiting 
distributions  are  determined  by  the  local  behavior  of  the  objective  function  because 
of  the  fast  rate  of  convergence  of  the  estimated  break  points.  Accordingly,  one  can 
view  the  limiting  distribution  of  T,  —  T°  as  being  solely  determined  by  the  segment 

P?-,  +  hT?+1). 

Because  the  estimates  of  the  break  fractions  are  consistent  at  rate  T,  they  are  es- 
sentially determined  by  a  bounded  finite  number  of  observations  with  large  probability 
no  matter  how  large  is  the  sample.  This  explains  the  asymptotic  independence  of  the 
estimated  break  points  since  each  contributing  segments  are  increasingly  apart  as  the 
sample  size  increases.  The  regression  coefficients,  on  the  other  hand,  are  estimated 
using  the  entire  data  set  or  at  least  a  positive  fraction.  One  can  then  think  of  possibly 
deleting  the  finite  number  of  observations  that  determine  the  distribution  of  the  break 
points  when  estimating  the  regression  coefficients  without  affecting  their  limiting  dis- 
tribution. Hence,  the  information  determining  the  distribution  of  the  break  points 
and  the  remaining  coefficients  can  be  viewed  as  non-overlapping  observations  from 
which  the  asymptotic  independence  of  the  break-point  estimates  and  the  coefficients 
estimates  follows. 

The  above  result,  though  of  definite  theoretical  interest,  is  perhaps  of  limited 
practical  use  because  of  the  dependence  of  the  limiting  distribution  of  T,  —  Tf  on  the 
distribution  of  {z\ f\  uj' '}  for  each  segment  i  (though  it  is  independent  of  xt  and  /?)  . 
Hinkley  (1971)  provides  an  analytical  expression  for  the  probability  density  function 
in  the  case  where  zt  =  \  and  ut  i.i.d.  normal.  In  general,  if  one  knows  the  distribution 
of  {z\ [*',  u\ *'},  the  stochastic  process  W^  and  the  location  of  its  maximum  can  be 
easily  simulated. 
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3.3.2     Limiting  Distributions  with  Shrinking  Shifts. 

An  alternative  strategy  is  to  consider  an  asymptotic  framework  where  the  magnitudes 
of  the  shifts  converge  to  zero  as  the  sample  size  increases.  Even  though  the  setup  is 
particularly  well  suited  to  provide  an  adequate  approximation  to  the  exact  distribution 
when  the  shifts  are  small,  it  remains  adequate  even  for  moderate  shifts.  This  alter- 
native asymptotic  framework  also  allows  a  substantial  relaxation  of  the  assumption 
concerning  the  distribution  of  {zt,ut}.  Moreover,  the  resulting  limiting  distribution  is 
independent  of  the  specific  distribution  of  this  pair. 

We  provide,  in  this  section,  a  description  of  the  results  when  the  data  are  not 
trending.  The  required  conditions  are  stated  in  the  next  assumption. 

A7.  a)  Let  AT?  =  T?-T?_v  The  process  {(zt,ut);  Tf^  + 1  <  t  <  T°}  is  such  that 

T?_1+[sAT°]  Tf_1+[s-AT9] 

pWmiAT?)-1        Yl       Eztz't  =  sQi&ndVYim{ATf)-1        £        Eu2t=s*?, 

uniformly  in  s  €  [0, 1]  for  i  =  1,  ...,m  +  1. 

b)  The  following  limit  exists: 

T°_!  +[sAT°]  2?_j  +{s AT°] 

plimiAT?)-1        £  Yl       E(zTz'tuTut)  =  sty    (z  =  1,  ...,m  + 1). 

T=Tf_1+l        t=Tl°_1+l 

uniformly  in  s  €  [0, 1]. 

c)  A  functional  central  limit  theorem  holds  for  {ztut}.  That  is,  as  AT,0  — *  oo: 

Tf^+lsAT?] 

(AT,0)"1/2       £       ztut=>Bi{s)   (t  =  l,...,m+l), 

where  B(s)  is  a  multivariate  Gaussian  process  on  [0, 1]  with  mean  zero  and  covariance 
EBi(s)Bi(u)  =  min{s,u}ftt. 

The  next  assumption  concerns  the  behavior  of  the  shifts  as  T  increases.  Its  role 
lies  in  the  fact  that  if  the  shifts  decrease  as  T  increases  more  observations  are  needed 
around  each  break  point  to  discern  it.  Hence,  this  make  possible  the  application  of  a 
central  limit  theorem. 

A8.  Let  Ax,i  =  ^j-t+i  —  &Ti  (*  =  l,--,77*)-  Assume  A^,-  =  u^A,  for  some  A, 
independent  of  T,  where  vj  is  a  scalar  satisfying 
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vT  -*  0,  and   T1/2~avT  -»  oo,  for  some  a  €  (0, 1/2). 
For  i  =  l,  ...,m,  let: 

6  =  a;.qi+1a,/a:q,a„ 

*?,    =    AlaAi/A^.A,-, 

#2  =  A;nt+1At/A;.gl+1A,-, 

and  let  W^  '(s)  and  W2  (s)  be  independent  standard  Wiener  processes  denned  on 
[0,  oo),  starting  at  the  origin  when  s  =  0.  These  processes  are  also  independent  across 
i.  Define,  for  i  =  1,  ...,m: 


(12)  Z«(s)  =  < 


rfuwtfk-')  -  M/2,       jf«<o, 

V£foW2w(a)-&M/2,    ifa>0. 

We  are  now  in  a  position  to  state  the  following  result. 


Proposition  6   Under  assumptions  Al-A5,  A7-A8, 

(13)  (A;<9,-A,-)4(T;  -  2?)  4  arg  max.Z«(a)       (i  =  1, ...,  m) 

The  limiting  distribution  is  the  same  as  that  occurring  in  a  single  break  model. 
The  density  function  of  argmaxsZW(.s)  is  derived  in  Bai  (1994b).  When  the  limits  Qi, 
£2,  and  a\  are  the  same  for  adjacent  t's,  £,-  =  1,  and  ^,4  =  <£ti2  =  ^,  in  which  case  the 
limiting  distribution  (13)  reduces  to: 

(AJQAOt^Cfi-I?)    -i     argmax.{^W(a)_|5|/2} 

=     ^2argmaxs{^«(S)  -  |s|/2}. 
or 

(14)  W^ 4{fi "  7f) "  arsm,ax^(,)(5) "  '5l/2>- 

Finally,  when  the  errors  are  uncorrelated,  we  have  £2  =  a2Q  and  the  limiting  result 

reduces  to 

(15)  (A& A,-) 1       ^  _         ^  argm^^')^)  _  |5|/2}. 

In  this  case,  the  limiting  density  function  is  symmetric  about  the  origin.  This  case 
has  been  analyzed  by  Bhattacharya  (1987),  Picard  (1985)  and  Yao  (1987)  for  a  single 
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break.  The  cumulative  distribution  function  of  arg  max s{W^(s)  —  |s|/2}  is  (see  Yao 
(1987)): 

H(x)  =  1  +  (27r)-1/2v^e-1/8  -  |(x  +  5)*(-v^/2)  +  |C**(-3>/i/2), 

for  x  >  0  and  #(x)  =  1  —  H(—x),  where  $(x)  is  the  distribution  function  of  a 
standard  normal  variable.  For  instance,  the  95%  and  97.5%  quantiles  are  7.7  and 
11.0,  respectively. 

The  results  discussed  above  allows  easy  construction  of  confidence  intervals  for 
the  break  dates.  All  that  is  needed  is  to  construct  consistent  estimates  of  the  various 
parameters;  T_1  J2t=i  ztz[  f°r  Qi  ^_1  St=i  "t  f°r  °2-  The  parameters  ujA,-  can  be  con- 
sistently estimated  by  the  differences  in  the  coefficient  estimates  Si  —  6,_i .  When  serial 
correlation  is  present  ft  can  be  estimated  using  a  kernel-based  method  as  discussed 
in  Andrews  (1991).  Note  that  when  the  segments  are  not  homogeneous,  obtaining 
consistent  estimates  of  these  quantities  is  still  possible  using  data  over  the  relevant 
subsamples  only. 

The  limiting  distribution  in  the  case  of  trending  regressors  is  discussed  in  Bai 
(1994b)  for  a  single  structural  change  model.  His  results  remains  valid  for  multiple 
breaks.  We  omit  the  details  and  refer  the  reader  to  that  paper  for  more  details. 

4     Test  Statistics  for  Multiple  Breaks. 

4.1      A  Test  of  no  break  versus  some  fixed  number  of  breaks. 

We  consider  the  sup  F  type  test  of  no  structural  break  (m  =  0)  versus  the  alternative 
hypothesis  that  there  are  m  =  k  breaks.  Let  (7\,  ...,Tfc)  be  a  partition  such  that  T,  = 
[TXi\  (i  =  1,  ...,&).  Again  let  Z  denote  the  matrix  which  diagonally  partitions  Z  at 
(Ti,...,T/t).  Let  R  be  the  conventional  matrix  such  that  (RS)'  =  (S[  —8'2,  ...,S'k  —  8'k+1). 
Define 

(T  -  (k  +  !)<?  -  p\  8'R\R{Z'MXZ)-*R')-'R8 
(16)         FT(XU ...,  A,;  q)  =  ^ j — . 

Here  SSRk  is  the  sum  of  squared  residuals  under  the  alternative  hypothesis,  which 
depends  on  (Ti,...,Tjt);  q  is  the  number  of  regressors  whose  coefficients  are  subject 
to  change.  The  statistic  Ft  is  simply  the  conventional  F-statistic  for  testing  8i  = 
■  ■  ■  =  8k+i  against  6,  ^  8i+i  for  some  i  with  given  T\, ...,  I*.  To  carry  the  asymptotic 
analysis,  we  need  to  impose  some  restrictions  on  the  possible  values  of  the  break  dates. 
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In  particular,  we  need  to  restrict  each  break  date  to  be  asymptotically  distinct  and 
bounded  from  the  boundaries  of  the  sample.  To  this  effect,  we  define  the  following  set 
for  some  arbitrary  small  positive  number  e: 

Ac  =  {(Ai, ...,  Afc);  |A)+1  -  A,|  >  e,  Ax  >  e,  Afc  <  1  -  e}. 

The  sup  F  type  test  statistic  is  then  defined  as 

supFT(k]q)=        sup       FT(\i,...,\k;q). 

(Aj A*)eA< 

This  test  is  a  generalization  of  the  supF  test  considered  by  Andrews  (1993)  and  others 
for  the  case  k  =  1. 

The  limiting  distribution  of  the  test  depends  on  the  nature  of  the  regressors  and 
the  presence  or  absence  of  serial  correlation  and  heterogeneity  in  the  residuals.  We 
consider  the  case  where  the  following  additional  assumptions  are  imposed. 

A9.  Let  wt  —  (x't,  z't)'  and  Q  be  some  positive  definite  matrix,  we  assume  that 

.        [Ts] 
PumT->oor_1  J2  WtW't  =  5<5' 

t=l 
uniformly  in  s  €  [0,1].    Note  that  A9  precludes  the  presence  of  trending  regressors. 
Extensions  to  the  general  case  where  plimj^^T-1  Y/tJi  wtw't  =  Q(s),  which  allows 
trending  regressors,  are  beyond  the  scope  of  the  present  paper.  They  will  be  discussed 
in  a  separate  paper  by  the  authors. 

A10.  The  disturbances  {ut}  form  an  array  of  martingale  differences  relative  to  {Ft} 
where  Tt  —  <r-field  {•  •  •  ,wt-i,wu  ■  •  •  ,ut_2,?it_i}.  Also,  E[v%\  =  a2  for  all  t  and  a 
functional  central  limit  theorem  holds  for  {wtut}  such  that 

[TV] 

t=i 
where  W*(r)  is  a  (p  +  q)  vector  of  independent  Wiener  processes. 

The  case  where  {ut}  satisfies  the  general  conditions  stated  in  Assumption  A4 
is  discussed  in  Section  4.4  below.  We  show  how  the  results  remain  valid  provided 
appropriate  modifications  are  made  to  account  for  the  effect  of  serial  correlation  on 
the  asymptotic  distributions. 

The  following  proposition,  proved  in  the  appendix,  relates  to  the  asymptotic  dis- 
tribution of  the  sup  Fr(k;  q)  test. 
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Proposition  7   Under  assumptions  A9-A10: 


supFT(fc;  q)  -i  supF^  =f        sup        F(XU  ...,  A*;  g), 

(*i Afc)eA, 


where 


Fn         i.„x        1  ^  [W,(A.-+1)  -  At+1^(A,)r[A,^(A,+i)  -  A.-+1W,(A,-)] 

f  (Aj, ...,  Afc)  gj  -  —  2^ r-r tt r-r , 

«9,=i  AtA,+1(A1+1  -  A,) 

Wq(-)  is  a  q-vector  of  independent  Wiener  processes  on  [0,1]  and  A^+i  =  1. 

Note  that  the  asymptotic  distribution  of  the  test  statistic  depends  on  the  value  of 
e  in  Ae.  As  e  converges  to  zero,  the  test  statistic  diverges  to  infinity.  Thus  a  small 
positive  value  instead  of  zero  can  improve  the  power  significantly,  see  Andrews  (1993) 
for  further  details.  In  what  follows,  we  have  adopted  a  value  e  =  0.05.  No  critical 
values  for  supF  tests  for  k  >  2  are  available  except  those  of  Garcia  and  Perron  (1994) 
who  provide  a  partial  tabulation  for  k  =  2  and  q  =  1. 

Asymptotic  critical  values  are  obtained  via  simulations,  using  an  approach  similar 
to  that  in  Andrews  (1993)  and  Garcia  and  Perron  (1994).  The  Wiener  process  W,(A) 
is  approximated  by  the  partial  sums  n-1/2  £»=i ^  e,-  with  et-  i.i.d.  N(0^Iq)  and  n  = 
1,000.  The  number  of  replications  is  10,000.  For  each  replication,  the  supremum  of 
F(Xi, ...,  A*;  q)  with  respect  to  (Ai, ...,  A*)  over  the  set  At  is  obtained  via  a  dynamic 
programming  algorithm  (see  the  appendix  for  further  details).  We  present,  in  Table  1, 
critical  values  covering  cases  with  up  to  9  breaks  (i.e.,  up  to  10  regimes,  k  —  1,...,9) 
and  up  to  10  regressors  (q  =  1, ...,  10)  whose  coefficients  are  the  object  of  the  test.  The 
reported  values  are  scaled  up  by  q  for  comparison  purposes.  The  column  corresponding 
to  one  break  (k  =  1)  can  also  be  found  in  Andrews  (1993). 

4.2     A  double  maximum  test. 

The  test  discussed  in  the  previous  section  requires  the  specification  of  the  number  of 
breaks,  m,  under  the  alternative  hypothesis.  It  is  of  interest  to  consider  a  test  of  no 
structural  break  against  an  unknown  number  of  breaks  given  some  upper  bound  M  on 
the  maximum  number  of  breaks  permitted.  A  new  test,  called  the  double  maximum 
test,  can  now  be  defined  as 

(17)  DmaxFT(M,q)  =    max  sup        Fr(Ai,...,  \m;q). 

l<m<M  (Ai,...,Am)€A« 
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For  a  fixed  m,  F(Al5 ...,  Am;  q)  is  the  sum  of  m  dependent  chi-square  random  variables 
with  q  degrees  of  freedom,  each  one  divided  by  m.  This  scaling  by  m  can  be  viewed, 
in  some  sense,  as  a  prior  imposed  to  account  for  the  fact  that  as  m  increases  a  fixed 
sample  of  data  becomes  less  informative  about  the  hypotheses  being  confronted. 

The  last  column  of  Table  1  reports  the  critical  values  of  this  test  for  M  =  5  and 
e  =  0.05.  This  should  be  sufficient  for  most  empirical  applications.  In  any  event,  the 
critical  values  vary  little  for  choices  of  the  upper  bound  M  larger  than  5. 

4.3     Test  of  £  versus  £+1  breaks. 

This  section  considers  a  test  of  the  null  hypothesis  that  £  unknown  breaks  exist  against 
the  alternative  that  an  additional  break  exists.  Ideally,  one  would  base  the  test  on 
the  difference  between  the  sum  of  squared  residuals  obtained  allowing  £  breaks  and 
that  obtained  allowing  £  +  1  breaks.  The  limiting  distribution  of  this  test  statistic 
is,  however,  difficult  to  use.  Here  we  pursue  a  different  strategy.  For  the  model  with 
£  breaks,  the  estimated  break  points  denoted  by  Ti,...,Ti,  are  obtained  by  a  global 
minimization  of  the  sum  of  squared  residuals.  Our  test  strategy  proceeds  conditional 
on  the  £  estimated  break  points  under  the  null  hypothesis  by  testing  each  (£  +  1) 
segments  for  the  presence  of  an  additional  break. 

The  test  amounts  to  the  application  of  (^+1)  tests  of  the  null  hypothesis  of  no  struc- 
tural change  versus  the  alternative  hypothesis  of  a  single  change.  The  test  is  therefore 
applied  to  each  segment  containing  the  observations  7i_i  to  Tt-  (i  —  1,...,£  +  1)  using 
again  the  convention  that  To  =  1  and  Tf+i  =  T.  We  conclude  for  a  rejection  in  favor  of 
a  model  with  (^+1)  breaks  if  the  overall  minimal  value  of  the  sum  of  squared  residuals 
(over  all  segments  where  an  additional  break  is  included)  is  sufficiently  smaller  than 
the  sum  of  squared  residuals  from  the  £  break  model.  The  break  date  thus  selected  is 
the  one  associated  with  this  overall  minimum.  More  precisely,  the  test  is  defined  by: 

(18)  FT(£  +  l\£)  =  {ST(fu...,fe)-    imn     inf   ST(f1,...,ft-1,T,f„...,fe)}/a\ 

where 

(19)  A,,,  =  {rjiU  +  (f{  -  f^T)  <r<fi-  (fi  -  fi-M, 

and  a2  is  a  consistent  estimate  of  a2  under  the  null  hypothesis.  Note  that  for  i  = 
1,  5t(Ti,...,T',_i,t,  Ti, . ..,!/)  is  understood  as  St(t,  Ti,  ...,T*)  and  for  i  —  £  +  1  as 

ST(fi,...,ft,T). 
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In  the  case  of  a  pure  structural  change,  an  alternative  interpretation  of  the  test 
can  be  given  as  follows.  Let  D(i,j)  be  the  minimized  sum  of  squared  residuals  for 
the  segment  containing  observations  from  (i i  +  1)  to  j,  then  the  test  statistic  can  be 
written  as 


-2 


(20)         FT(£+1\£)  =     sup      sap{P(Ti.llTi)-D{Ti.UT)-D(TiTi)}/a 

l<i<t+l  t6A;,, 

This  follows  from  5T(Ti, ...,  ft)  =  D(0, 2\)  +  D(fuf2)  +  ■■■  +  D(ft,  T)  and  a  similar 
expression  for  St(Ti,  ...,  T,,  k,  T,+i,  -  •  ■ ,  Tt)  so  that  many  common  terms  are  canceled 
out.  Under  assumptions  A9-A10,  standard  arguments  show  that, 

(2l)a^  snP{D(T^Tf)-D(T^r)-D(r^)}^     sup      ISfell  gSSll!!, 
reA^  »?<^<i-t!  Ml1  -  Mj 

where,  as  before,  W,(-)  is  a  q— vector  of  independent  Wiener  processes  on  [0, 1]  and  A°iT? 
is  as  defined  in  (19)  with  J1,-  replaced  by  T®.  Under  the  null  hypothesis,  Proposition  3 
asserts  that  T,  =  T°  +  Op(l).  Using  this  result,  it  is  not  difficult  to  show  that  the  weak 
convergence  in  (21)  also  holds  with  Z£.j  and  If  replaced  by  r,_i  and  T,,  respectively. 
In  addition,  because  over  different  regimes  D(-,  •)  are  computed  using  non-overlapping 
observations,  the  weak  limits  in  (21)  for  different  i's  are  independent.  Thus  the  limit 
of  (20)  is  the  maximum  of  £  +  1  independent  random  variables  in  the  form  of  (21), 
and  we  have  the  following  result: 

Proposition  8   Under  assumptions  A9-A10: 

(22)  lim  P(FT(£+1\£)  <  x)  =  G,,„(x)'+1, 

T— >oo 

where  Gq^(x)  is  the  distribution  function  of  the  random  variable 

\\Wqfr)  -  pWq(l)\\> 

sup      7- : • 

v<»<i-v  M1-/^ 

The  critical  values  of  this  test  for  different  values  of  £  can  be  obtained  from  the 
distribution  function  Gq>v(x).  A  partial  tabulation  of  some  percentage  points  can  be 
found  in  DeLong  (1981)  and  Andrews  (1993)  (see  also  the  first  column  of  our  Table 
1).  However,  the  grid  presented  is  not  fine  enough  to  allow  obtaining  the  relevant 
percentage  points  of  Gq^{x)t+l .  Accordingly,  we  provide  a  full  set  of  critical  values 
in  Table  2  calculated  with  rj  =  .05.  These  were  obtained  using  a  simulation  method 
similar  to  that  used  for  the  construction  of  Table  1 . 
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Note  that  a2  is  only  required  to  be  consistent  under  the  null  hypothesis  for  the 
validity  of  the  stated  asymptotic  distribution.  The  test  may,  however,  have  better 
power  if  a2  is  also  consistent  under  the  alternative  hypothesis.  If  the  latter  is  true,  a2 
constructed  under  the  null  hypothesis  will  overestimate  a2.  The  test  statistic  is  then 
biased  downward,  thereby  decreasing  its  power.  A  consistent  estimator  under  both 
the  null  and  the  alternative  hypothesis  is  given  by 

°   =  j;St(Ti,...,Ti+i), 

where  2\, ...,  Tt+\  are  the  estimates  of  the  £+1  break  points.  Note,  finally,  that  the  re- 
sults discussed  in  this  section,  including  (22),  hold  true  for  partial  structural  changes. 
Also,  it  is  important  to  note  that  the  results  carry  through  allowing  different  distribu- 
tions across  segments  for  the  regressors  and  the  errors.  That  is,  Proposition  7  remains 
valid  under  A7(a  and  c)  instead  of  A9-A10,  provided  a2  is  replaced  by  a2  in  (20). 

4.4     Extensions  to  serially  correlated  errors. 

The  tests  discussed  above  can  be  applied  without  the  imposition  of  serially  uncor- 
rected errors  as  specified  in  Assumption  A 10.  In  this  case,  some  modifications  are 
necessary  to  take  account  of  the  change  in  the  limiting  distribution  of  the  statistics 
under  the  null  hypothesis.  A  simple  modification  is  to  use  the  following  version  of  the 
F-test  instead  of  that  specified  in  (16): 

(23)  Fft\u...,\k;q)  =  I  (-  "  {k  +  1)g  " P)  6'K(EV(6)B!)-lR6, 

where  V{8)  is  an  estimate  of  the  variance  covariance  matrix  of  8  that  is  robust  to 
serial  correlation  and  heteroskedasticity;  i.e.  a  consistent  estimate  of 


-l 


(24)  V{8)  =  p\\wT(Z'MxZ)-'lZ'MxnMxZ(Z'MxZ) 

A  consistent  estimate  of  V(8)  can  be  obtained  using  methods  such  as  those  sug- 
gested by,  e.g.,  Andrews  (1991).  Again,  note  that  this  estimate  can  be  constructed 
allowing  identical  or  different  distributions  for  the  regressors  and  the  errors  across 
segments.  In  some  instances,  the  form  of  the  statistic  reduces  in  an  interesting  way. 
Consider  the  case  of  a  pure  structural  change  model  (/?  =  0)  where  the  explanatory 
variables  are  such  that 

(25)  phm^Z'nZ  =  hu(0)plim^Z'Z 
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with  hu(0)  the  spectral  density  function  of  the  errors  ut  evaluated  at  the  zero  frequency. 
In  that  case, 

V(6)  =  MO)plim(^)-1, 
and  the  robust  version  of  the  F-test  can  be  constructed  as: 

FZ(\u...,Xk]q)  =  (r"(fc^  1)g"P)  S'B!(R(z'Z)-1Ii!)-1R6/hu(0). 

with  hu(0)  a  consistent  estimate  of  hu(0).  In  that  case,  we  have  the  following  asymp- 
totically equivalent  test 


a2 


F}(\i, ...,  h;  q)  =  j——FT{X1, ...,  Xk;  q), 
hu(0) 

with  a2  =  T-1  J2t=i  "t  a  consistent  estimate  of  the  variance  of  the  residuals.  Hence, 
the  robust  version  of  the  test  is  simply  a  scaled  version  of  the  original  statistic.  The 
condition  (25)  holds,  for  example,  where  testing  for  a  change  in  mean  as  in  Garcia 
and  Perron  (1994). 

The  computation  of  the  robust  version  of  the  F-test  (23)  can  be  quite  computation- 
ally involved  when  considering  all  combinations  of  possible  break  points,  especially  if  a 
data  dependent  method  is  used  to  construct  the  robust  asymptotic  covariance  matrix 
of  S.  An  asymptotically  equivalent  version  is  to  first  take  the  supremum  of  the  original 
F-test  to  obtain  the  break  points,  i.e.  imposing  Q  =  a2I .  This  can  be  done  since  the 
break  fractions  are  T-consistent  even  in  the  presence  of  correlated  errors.  One  can 
then  obtain  a  robust  version  of  the  test  by  evaluating  (23)  and  (24)  at  these  estimated 
break  dates. 

The  extensions  for  the  PmaxFj  test  and  the  sequential  Fr(£  +  1\£)  are  similar 
since  they  are  simply  functions  of  the  sup  Fr(k,q)  tests. 

4.5      Consistency  of  the  tests. 

A  test  is  consistent  if,  under  the  alternative  hypothesis,  the  associated  test  statistic 
diverges  to  infinity  as  the  sample  size  increases.  Because  supFT(l;  q)  <  2supFT(2;  q)  < 
A;supFT(A;;  9),  the  consistency  of  the  supFr(/:;  q)  (k  >  2)  follows  immediately  from  the 
results  of  Andrews  (1993)  who  proved  that  the  test  based  on  the  statistic  supFr(l;  q) 
is  consistent  for  various  alternatives  including  multiple  breaks.  Consequently,  the  test 
based  on  the  statistic  Dmax-F^A:;  q)  is  also  consistent. 
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We  next  argue  that  the  test  based  on  Ft(£  +  1\£)  is  also  consistent.  If  there  are 
more  than  £  breaks  and  a  model  with  only  £  breaks  is  estimated,  there  must  be  at  least 
one  break  that  is  not  estimated.  Hence,  at  least  one  segment  contains  a  nontrivial 
break  point  in  the  sense  that  both  boundaries  of  each  segment  is  separated  from 
the  true  break  point  by  a  positive  fraction  of  the  total  number  of  observations.  For 
this  segment,  the  supFT(l;g)  test  statistic  converges  to  infinity  as  the  sample  size 
increases  since  it  is  consistent.  Accordingly,  the  statistic  Ft(£  +  l\£)  (computed  for 
£  +  1  segments)  also  converges  to  infinity.  This  shows  consistency. 

5      Sequential  Methods. 

In  this  section  we  discuss  issues  related  to  the  sequential  estimation  of  the  breaks 
points.  We  start,  in  section  5.1  with  some  results  about  the  limit  of  break  point  esti- 
mates in  underspecified  models,  i.e.  when  the  regression  structure  allows  for  a  smaller 
number  of  breaks  than  contained  in  the  data-generating  process.  An  interesting  by- 
product of  this  analyses  is  the  possibility  of  a  sequential  algorithm  to  estimate  models 
with  an  unknown  number  of  breaks.  This  is  discussed  in  section  5.2. 

5.1      The  Limit  of  Break  Point  Estimates  in  Underspecified 
Models. 

In  this  section,  we  show  that  the  estimate  of  the  break  fraction  in  a  single  structural 
change  regression  applied  to  data  that  contain  two  breaks  converge  to  one  of  the  two 
true  break  fractions.  In  independent  work,  Chong  (1994)  obtains  a  similar  result  (see 
also  Bai  (1994c)  for  an  earlier  exposition). 

To  present  our  arguments,  we  consider  a  simple  three-regime  model: 

yt    =    (ii  +  eu    if*  <  [TAi] 

(26)  Vt    =    (*2  +  £t,    if  [T\i]  +  1  <  *  <  [TX2] 

yt    =    fi3  +  et,    if  [TX2]  +  l<t<T. 

with  et  ~  i.i.d.(0,  of).  Assume  fii  ^  (i2,  fi2  ^  f*3,  and  X\  <  A2,  so  there  are  two  break 
points  in  the  model.  Let  Ta  denote  the  estimated  single  shift  point.  Our  aim  is  to 
show  that  despite  the  misspecification  of  the  number  of  regimes,  Ta/T  is  consistent 
for  either  Ai  or  A2  depending  on  the  relative  magnitudes  of  the  shifts  and  the  spell  of 
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each  regime.  To  verify  this  claim,  we  examine  the  global  behavior  of  S(t),  defined  as 
the  limit  of  T^StUTt]).  We  define  St(0)  and  St(T)  as  the  sum  of  squared  residuals 
for  the  full  sample  without  a  break  (i.e.  Y^(yt  -  j/)2,  where  y  is  the  sample  mean). 
In  this  way,  St([Tt])  is  well  defined  for  all  r  6  [0, 1].  It  is  not  difficult  to  show  that 
the  convergence  of  T~l St{[Tt})  to  S(t)  is  uniform  in  r  €  [0, 1].  In  particular 

(27)  ^([TAx])  A  5(A0  =  a\  +  (1~A12L(AA21"Al)(^  ~  *)' 


and 


1  c    nm\    i\    P     CM    \ 2_,*l/i  \w  x2 


(28)  ^5t([TA2])  A  S(A2)  =  <r2  +  ^(A2  -  A1)(/z1  -  ^) 

Without  loss  of  generality  we  consider  the  case  where  5(Ai)  <  £(A2),  our  claim  is 
stated  in  the  following  lemma 

Lemma  3  Suppose  that  the  data  are  generated  by  (26)  and  that  S{\\)  <  5(A2),  the 
estimated  single  break  point  Ta/T  is  consistent  for  Ai. 

Proof:  This  can  be  proved  by  showing  that  S(t)  for  r  €  [0, 1]  has  a  unique  minimum 
at  Ai.    The  function  S(t)  has  different  expressions  over  [0,1].   Some  algebra  reveals 

that 

S(t)  -  S(X1)  =         \~T  -    [(1  -  Ax)^  -  to)  +  (1  -  X2)(to  -  to))2,     r  <  Al5 

(1  -Tj(l  -M) 

which  is  nonnegative.  Under  the  assumption  that  S(\\)  <  S(X2),  the  expression  in  the 
bracket  above  is  nonzero,  so  S(t)  —  S(Xi)  is  strictly  positive  for  r  <  Ax.  By  symmetry 
(regarded  as  reversing  the  data  order),  S(t)  —  S(X2)  is  nonnegative  for  r  >  A2.  Thus 
for  r  g  [A2, 1],  S(t)  -  SiX,)  =  S(t)  -  S(X2)  +  S(X2)  -  S(XX)  >  S(X2)  -  5(AX)  >  0.  It 
remains  to  consider  the  case  where  r  €  (Ai,A2).  Again,  simple  algebra  shows 

S(r)-S(A0     =     (r  -  X^ito  -  in)2  -       ^~){i2-Xl){tl3  ~  toW 

=    {r-X,)-\~{to-to)    -Aa(1_T)(1_  *,)(/* -^)J 

>    (r-A^^AO-^Ai)] 

where  the  first  inequality  follows  from  ^~_'|  <  1  and  the  second  inequality  follows 
from  -*■  <  1.   Thus  5(r)  —  S(Xi)  is  strictly  positive  for  r  €  (Ai,A2).   Thus  we  have 
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shown  that  S(t)  has  a  unique  global  minimum  at  A!  when  S(Xi)  <  5"(A2).  Now 
because  St(To)  <  St([T\\]),  it  follows  that  Ta/T  is  consistent  for  Ai.O 

The  assumption  that  S(Xi)  <  £^2)  imphes  that  the  first  break  point  is  more 
pronounced  or  dominating  in  terms  of  the  relative  magnitude  of  shifts  and  the  regime 
spells.  The  above  lemma  implies  that  only  when  the  dominating  break  is  identified 
can  the  sum  of  squared  residuals  be  reduced  the  most.  Given  that  Ta/T  is  consistent 
for  Ai,  one  can  use  the  subsample  [Ta,  T]  to  estimate  another  break  point  such  that  the 
sum  of  squared  residuals  is  minimized  for  this  subsample.  The  resulting  estimate  is 
then  consistent  for  A2.  This  follows  from  the  same  type  of  argument  as  in  the  preceding 
paragraph  because  only  A2  can  be  the  dominating  break  in  the  sample  [Ta,  T],  even  if 
Ta  <  [TX\].  Hence,  if  one  knows  that  Ta/T  is  consistent  for  Ai,  a  consistent  estimator 
for  A2  can  also  be  obtained. 

It  is  relatively  straightforward  to  extend  the  argument  to  the  case  where  a  one- 
break  model  is  fitted  to  a  relationship  that  exhibits  more  than  two  breaks.  The 
principle  is  the  same  and  the  estimate  of  the  break  fraction  converges  to  one  of  the 
true  break  fraction,  namely  the  one  which  allows  a  greatest  reduction  in  the  sum  of 
squared  residuals.  It  is  also  conjectured  that  a  similar  result  holds  when,  say,  an  mi 
break  model  is  fitted  to  a  relationship  that  has  m2  breaks  (with  7712  >  mi).  Such  a 
general  result  is  not,  however,  needed  for  the  arguments  that  follow. 

5.2      Sequential  estimation  of  the  break  points. 

The  arguments  in  section  5.1  showed  that  Ta/T  is  consistent  for  one  of  the  true  break 
point,  the  one  that  allows  the  greatest  reduction  in  the  sum  of  squared  residuals. 
Suppose,  as  above,  that  this  break  point  is  Al5  which,  in  general,  may  not  be  known 
(i.e.,  we  do  not  know  if  the  other  break  is  before  or  after).  In  that  case,  we  choose  one 
break  point  either  in  the  interval  [1,  Ta]  or  in  the  interval  [Ta,  T],  such  that  the  sum  of 
squared  residuals  for  all  observations  [1,T]  is  minimized.  With  probability  tending  to 
1  as  the  sample  size  increases,  the  estimated  break  point  will  be  in  the  interval  [Ta,  T]. 
This  follows  since,  in  the  absence  of  a  break  in  the  interval  [1,  Ta],  allowing  one  more 
break  in  that  segment  will  not  significantly  reduce  the  sum  of  squared  residuals.  On 
the  other  hand,  allowing  one  more  break  in  the  interval  [Ta,r]  will  permit  a  large 
reduction  given  the  presence  of  a  break.  Notationally,  with  probability  tending  to  1, 

min{  inf  5T(r,fa),  inf  ST(fa,T)  }  =  inf  ST(fa,r)  =  ST(fa,t) 
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where  f  is  equivalently  obtained  by  minimizing  the  sum  of  squared  residuals  for  the 
subsample  [Ta,T],  and  thus  t/T  is  consistent  for  A2.  The  preceding  argument  implies 
that  we  can  obtain  consistent  estimates  of  Ai  and  A2  in  a  sequential  way. 

Similarly,  if  Ta  is  actually  consistent  for  A2  (this  will  be  true  if  S(\i)  >  5,(A2)),  the 
second  estimated  shift  point  will  be  in  [1,jT0].  Generally,  let  (Ni,N2)  be  the  ordered 
version  of  (f„,f)  such  that  Nt  <  N2-  Then  (Ni/T,N2/T)  is  consistent  for  (\u  A2). 

5.2.1  Sequential  estimation  with  a  known  number  of  break  points. 

The  above  analysis  suggests  a  straightforward  algorithm  for  estimating  models  with 
multiple  break  points  whether  the  number  of  break  points  is  known  or  unknown. 
Consider  first  the  case  of  a  known  number  of  break  points,  say  m.  The  idea  is  to 
estimate  the  breaks  sequentially  rather  than  simultaneously.  Once  the  first  break 
point  is  identified,  the  sample  is  split  into  two  subsamples  separated  by  this  first 
estimated  break  point.  For  each  subsample,  a  one  break  model  is  estimated  and  the 
second  break  point  is  chosen  as  that  break  point  (of  the  two  obtained)  which  allows 
the  greatest  reduction  in  the  sum  of  squared  residuals.  The  sample  is  then  partitioned 
in  three  regimes  and  again  a  third  break  point  is  selected  as  one  of  the  three  estimates 
from  three  estimated  one-break  model  that  allow  the  greatest  reduction  in  the  sum  of 
squared  residuals.  The  process  is  continued  until  the  m  break  points  are  selected. 

The  procedure  is  simple  to  implement  using  existing  least  squares  routines  with 
minor  modifications.  It  yields  consistent  estimates  of  the  break  points;  though  the 
estimates  are  not  guaranteed  to  be  identical  to  those  obtained  by  global  minimization. 
Interestingly,  it  allows  the  estimation  of  models  with  any  number  of  structural  changes 
using  a  number  of  least-squares  estimation  that  is  only  of  order  0(T). 

5.2.2  Sequential  estimation  with  an  unknown  number  of  breaks. 

Consider  now  the  case  of  an  unknown  number  of  breaks  which  is  likely  to  be  of 
particular  relevance  in  practice.  A  standard  problem  with  any  estimation  procedure 
is  that  an  improvement  in  the  objective  function  is  always  possible  by  allowing  more 
breaks.  This  naturally  leads  to  consider  a  penalty  factor  for  the  increased  dimension 
of  a  model.  Yao  (1988)  suggests  the  use  of  the  Bayesian  Information  Criterion  (BIC) 
defined  as 

BIC(m)  =  In  a2(m)  +  p'  ln{T)/T, 
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where  p'  =  (m  +  l)q  +  m  +  p,  and  <72(m)  =  r.-1S|r(Ti,  *..,£»).  He  showed  that 
the  number  of  breaks  can  be  consistently  estimated.  Many  other  criteria  such  as  the 
adjusted  R2,  the  prediction  error  criterion  and  Mallows  Cp  can  be  used  as  well.  The 
AIC  criterion  is  not  recommended  because,  as  is  widely  known,  it  has  a  tendency 
to  overestimate  the  dimension  of  a  model.  An  alternative  proposed  by  Liu,  Wu  and 
Zidek  (1994)  is  a  modified  Schwarz'  criterion  that  takes  the  form: 

MIC(m)  =  ]n(ST(fu...,fm)/(T-p*))  +  (P7r)co(ln(r))2+6°. 

They  suggest  using  S0  =  0.1  and  Co  =  0.299. 

We  propose  an  alternative  method  to  determine  the  number  of  breaks.  The  ap- 
proach is  directly  related  to  the  sequential  procedure  outlined  above.  Start  by  esti- 
mating a  model  with  a  small  number  of  breaks  that  are  thought  to  be  necessary  (or 
start  with  no  break).  Then  perform  parameter-constancy  tests  for  each  subsamples 
(those  obtained  by  cutting  off  at  the  estimated  breaks),  adding  a  break  to  a  subsam- 
ple  associated  with  a  rejection  with  the  test  Ft(£  +  l\£).  This  process  is  repeated 
increasing  £  sequentially  until  the  test  Ft(£  +  l\£)  fails  to  reject  the  null  hypothesis 
of  no  additional  structural  changes.  The  final  number  of  breaks  is  thus  equal  to  the 
number  of  rejections  obtained  with  the  parameter  constancy  tests  plus  the  number  of 
breaks  used  in  the  initial  round. 

It  is  important  to  note  that  the  application  of  the  test  Fj{£-\-  \\£)  in  this  sequential 
context  is  rather  different  from  that  discussed  earlier.  Indeed,  the  result  of  Proposition 
8  is  based  on  having  the  first  £  breaks  obtained  simultaneously,  i.e.  as  global  minimizers 
of  the  sum  of  squared  residuals  assuming  £  breaks.  The  reason  for  this  is  that  the 
stated  limiting  distribution  of  the  test  requires  convergence  of  the  estimates  of  the 
break  factions  at  rate  T.  Fortunately,  this  rate  T  convergence  extends  to  the  case 
where  the  break  points  are  obtained  sequentially  one  at  a  time.  This  last  result  is 
proved  in  a  very  recent  study  by  Bai  (1995b).  Hence,  the  limiting  distribution  of  the 
Ft(£+1\£)  test  in  the  current  sequential  setup  is  the  same  as  that  stated  in  Proposition 
8. 

With  probability  approaching  1  as  the  sample  size  increases,  the  number  of  breaks 
determined  this  way  will  be  no  less  than  the  true  number.  The  procedure  does  not 
provide  a  consistent  estimate  of  the  true  number  of  breaks,  say  mQ.  This  is  because  the 
sequential  method  is  based  on  a  test  procedure  which  implies  a  non-zero  probability  of 
rejection  under  the  null  hypothesis  given  by  the  level  of  the  test,  say  a.  However,  the 
asymptotic  probability  of  selecting  a  model  with  a  larger  number  of  breaks,  say  m0+ j, 
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is  given  by  a3  which  decreases  rapidly.  Hence,  there  is  no  need  (with  large  probability) 
to  estimate  models  with  more  than  the  true  number  of  breaks,  as  is  necessary  when 
using  a  model  selection  approach  based  on  an  information  criterion. 

The  sequential  procedure  could  be  made  consistent  by  adopting  a  significance  level 
for  the  test  Ft(£  +  1  \£)  that  decreases  to  zero,  at  a  suitable  rate,  as  the  sample  size 
increases.  A  result  to  that  effect  is  presented  in  the  next  proposition  whose  proof  is 
provided  in  the  appendix. 

Proposition  9  Let  m  be  the  number  of  breaks  obtained  using  the  sequential  method 
based  on  the  statistic  Ft(£  +  l\£)  applied  with  some  size  aj,  and  let  m0  be  the  true 
number  of  breaks.  If  err  converges  to  0  slowly  enough  (for  the  test  based  on  Ft(£+  l\£) 
to  remain  consistent),  then,  under  assumptions  A1-A5, 

P(m  =  m0)  — ►  1, 

as  T  — ►  oo.   That  is,  the  estimated  number  of  breaks  is  consistent  for  the  true  number. 

6      Empirical  Applications. 

In  this  section,  we  discuss  two  empirical  applications  of  the  procedures  presented  in 
this  paper.  The  first  analyzes  the  U.S.  ex-post  real  interest  rate  series  considered  by 
Garcia  and  Perron  (1994).  The  second  reevaluates  some  findings  of  Alogoskoufis  and 
Smith  (1991)  who  analyze  the  issue  of  changes  in  the  persistence  of  inflation  and  the 
corresponding  shifts  in  an  expectations-augmented  Phillips  curve  resulting  from  such 
changes  in  persistence. 

6.1      The  U.S  Ex- Post  Real  Interest  Rate. 

Garcia  and  Perron  (1994)  considered  the  time  series  properties  of  the  U.S.  Ex-Post 
real  interest  rate  (constructed  from  the  three-month  treasury  bill  rate  deflated  by  the 
CPI  inflation  rate  taken  from  the  Citibase  data  base).  The  data  are  quarterly  and  the 
sample  is  1961:1-1986:3.  Figure  1  presents  a  graph  of  the  series.  The  issue  of  interest 
is  the  presence  of  structural  changes  in  the  mean  of  the  series.  To  that  effect  we  apply 
our  procedure  with  only  a  constant  as  regressor  (i.e.  zt  =  {1})  and  take  into  account 
potential  serial  correlation  via  non-parametric  adjustments.  In  the  implementation  of 
the  procedure,  we  allowed  up  to  5  breaks  and  each  segment  was  constrained  to  have 
at  least  7  observations.  The  results  are  presented  in  Table  3. 
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The  first  issue  to  be  considered  is  the  determination  of  the  number  of  breaks.  Here 
the  sup  Ft (k)  tests  are  all  significant  for  k  between  1  and  5.  So  at  least  one  break  is 
present.  The  sup  .F:r(2|l)  test  takes  value  34.32  and  is  therefore  highly  significant.  The 
sequential  procedure  (using  a  1%  significance  level),  BIC  and  the  modified  Schwarz 
criterion  of  Liu,  Wu  and  Zidek  (1994)  all  select  two  breaks.  Hence,  we  conclude  in 
favor  of  the  presence  of  two  breaks.  Of  direct  interest  are  the  estimates  obtained 
under  global  minimization.  The  break  dates  are  estimated  at  1972:3  and  1980:3.  The 
first  date  has  a  rather  large  confidence  interval  (between  1971:2  and  1973:4  at  the 
95%  level).  The  second  break  date  is,  however,  precisely  estimated  since  the  95% 
confidence  interval  covers  only  one  quarter  before  and  after.  The  differences  in  the 
estimated  means  over  each  segment  are  significant  and  point  to  a  decrease  of  3.16% 
in  late  1972  and  a  large  increase  of  7.44%  in  late  1980.  These  results  confirm  those  of 
Garcia  and  Perron  (1994). 

6.2      Changes  in  the  Persistence  of  Inflation  and  the  Phillips 
Curve. 

Alogoskoufis  and  Smith  (1991)  consider  the  following  version  of  an  expectations- 
augmented  Phillips  curve: 

Awt  =  oi  +  a2E(Apt\It-\)  +  ct3Aut  +  a4ut_i  +  £t, 

where  wt  is  the  log  of  nominal  wages,  pt  is  the  log  of  the  Consumer  Price  Index,  and 
ut  is  the  unemployment  rate.  They  posit  that  inflation  is  an  AR(l)  so  that 

(29)  E(APt\It-i)  =  61+-82Apt-l. 
Hence,  upon  substitution,  the  Phillips  curve  is: 

(30)  Awt  =  7i  +  j2Apt^  +  73  Auf  +  74ut-i  +  6, 

where  72  =  0:262 ■  Here,  a  parameter  of  importance  is  82  which  is  interpreted  as 
measuring  the  persistence  of  inflation.  Using  post-war  annual  data  from  the  United 
Kingdom  and  the  United  States,  Alogoskoufis  and  Smith  (1991)  argue  that  the  process 
describing  inflation  exhibits  a  one-time  structural  change  from  1967  to  1968,  whereby 
the  autoregressive  parameter  82  is  significantly  higher  in  the  second  period.  This  is 
interpreted  as  evidence  that  the  abandonment  of  the  Bretton  Woods  system  relaxed 
the  discipline  imposed  by  the  gold  standard  and  created  higher  persistence  in  inflation. 
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They  also  argue  that  the  parameter  72  in  the  Phillips  curve  equation  (30)  exhibit  a 
similar  increase  at  the  same  time,  thereby  lending  support  to  the  empirical  significance 
of  the  Lucas  critique. 

Using  the  methods  presented  in  this  paper,  we  reevaluate  Alogoskoufis  and  Smith 
(1991)  claims  using  post-war  annual  data  for  the  United  Kingdom7.  Consider  first  the 
structural  stability  of  the  AR(l)  representation  of  inflation  whose  series  is  depicted  in 
Figure  2.  Details  of  the  estimation  results  are  contained  in  Table  4.  When  applying 
a  one  break  model,  we  indeed  find  the  same  results,  namely  a  structural  change  in 
1967  with  82  increasing  from  .274  to  .739  while  61  remains  constant.  The  estimate  of 
the  break  is,  however,  imprecisely  estimated  with  a  95%  confidence  interval  covering 
the  period  1961  —  1973.  More  importantly,  the  supi*r(l)  test  is  not  significant  at 
any  conventional  level  indicating  that  the  data  does  not  support  a  one  break  model. 
The  sup  Ft(2)  test  is,  however,  significant  at  the  5%  level  and  the  sup  Ft(2\1)  test  is 
significant  at  the  10%  level.  The  sup  Fj(£  +  1\£)  test  is  not  significant  for  any  £  >  2. 
BIC  selects  two  breaks  and  MIC  selects  one  suggesting  that  the  latter  may  impose 
too  strong  a  penalty  when  the  sample  size  is  small.  Overall,  the  tests  support  a  two 
break  model. 

The  estimates  of  a  two  breaks  model  reveal  a  rather  different  interpretation  of  the 
data.  The  first  break  date  is  not  linked  to  the  end  of  the  Bretton  Woods  system  but 
rather  with  the  first  oil  price  shock  in  1973.  The  second  break  is  located  in  1980.  The 
coefficient  estimates  point  to  the  importance  of  shifts  in  the  level  of  inflation  rather 
than  changes  in  persistence.  Indeed,  the  coefficient  6\  varies  from  .021  to  .130  in  the 
period  1973  —  1980,  and  back  to  .011  after  1980.  If  anything,  the  data  suggests  a 
significant  decrease  in  the  persistence  of  inflation  in  the  period  1973  —  1980,  while  the 
estimates  of  the  autoregressive  parameters  are  not  significantly  different  in  the  first 
and  last  segments. 

Since,  there  indeed  appears  to  be  structural  changes  in  the  inflation  process,  it  is  of 
interest  to  see  if  the  Phillips  curve  equation  underwent  similar  changes  in  accordance 
with  the  Lucas  critique.  The  results  are  presented  in  Table  5.  Here  the  evidence  point 
to  a  single  structural  change.  The  sup  Fj (k)  tests  are  significant  for  all  k  while  the 
sup  Ft{£+1\£)  tests  are  not  significant  for  any  £  >  1.  Furthermore,  both  the  sequential 
procedure  and  the  criterion  of  Liu,  Wu  and  Zidek  (1994)  select  a  one  break  model  (only 
BIC  chooses  two  breaks).  Again,  the  estimate  of  the  break  is  associated  with  the  first 


7The  data  are  the  same  as  in  Alogoskoufis  and  Smith  (1991)  and  were  kindly  provided  by  George 
Alogoskoufis.  We  refer  the  reader  to  their  paper  for  details  on  the  definition  and  source  of  each  series. 
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oil  price  shock  (1973)  and  not  with  the  end  of  the  Bretton  Woods  system  (the  95% 
confidence  interval  is  small).  The  estimate  of  72  indeed  shows  a  marked  decrease 
similar  to  the  decrease  in  the  persistence  of  inflation  (the  parameter  estimate  even 
becomes  negative  but  not  significantly  so).  Given  the  changes  in  the  estimates  of 
the  parameter  73  and  74,  the  data  suggest  that  the  Phillips  curve  itself  underwent  a 
structural  change  in  1973.  The  data,  however,  do  not  support  any  adjustment  of  the 
Phillips  curve  following  the  change  in  the  inflation  process  in  1980. 

Since  BIC  selects  two  breaks,  we  also  present  results  for  this  specification.  These 
lend  even  less  empirical  support  to  the  Lucas  critique  since  the  breaks  dates  are  1966 
and  1975  and  do  not  correspond  to  those  of  the  inflation  process. 

7     Conclusions. 

Our  analysis  has  presented  a  rather  comprehensive  treatment  of  issues  related  to  the 
estimation  of  linear  models  with  multiple  structural  changes,  to  tests  for  the  presence  of 
multiple  structural  changes  and  to  the  determination  of  the  number  of  changes  present. 
Our  results  being  asymptotic  in  nature,  there  is  certainly  a  need  to  evaluate  the  quality 
of  the  approximations  and  the  power  of  the  tests  in  finite  samples  via  simulations.  We 
intend  to  present  such  a  simulation  study  in  a  subsequent  paper.  Among  the  topics 
to  be  investigated,  an  important  one  appears  to  be  the  relative  merits  of  different 
methods  to  select  the  number  of  structural  changes.  There  are,  of  course,  many  other 
issues  on  the  agenda.  For  instance,  extensions  of  the  test  procedures  to  include  tests 
that  are  optimal  with  respect  to  some  criteria,  extensions  to  nonlinear  models  and  the 
derivation  of  tests  that  are  valid  in  the  presence  of  trending  regressors. 
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A     Mathematical  Appendix. 

As  a  matter  of  notation,  we  let  op(l)  and  0p(l)  denote,  respectively,  a  sequence 
of  random  variables  converging  to  zero  in  probability  and  one  that  is  stochastically 
bounded.  Unless  indicated  otherwise,  all  convergence  are  taken  as  T,  the  sample  size, 
increases  to  infinity.  For  a  sequence  of  matrices  Bt,  we  write  Bt  =  op(l)  if  each  of  its 
elements  is  op(l)  and  likewise  for  0p(l).  For  a  matrix  A,  Ma  denotes  the  orthogonal 
projection  matrix,  /  —  A(A'A)~1A'.  We  use  ||  •  ||  to  denote  the  Euclidean  norm,  i.e. 
||x!l  =  (Hi1?)1''2  f°r  x  £  R? ■  For  a  matrix  A,  we  use  the  vector-induced  norm,  i.e. 
\\A\\  =  supx?t0  ||Ax||/||x||.  We  note  that  the  norm  of  A  is  equal  to  the  square  root  of 
the  maximum  eigenvalue  of  A'A,  and  thus  ||A||  <  [fr^'A)]1/2.  Also,  for  a  projection 
matrix  P,  ||PA||  <  \\A\\.  Finally,  [a]  represents  the  integer  part  of  a. 

Proof  of  Lemma  1:   We  start  with  a  series  of  lemmas  that  will  be  used  subse- 
quently. Assumption  A5  is  assumed  throughout. 

Lemma  A.l  Let  S  and  V  be  two  matrices  having  the  same  number  of  rows.  Then  the 
matrix  S'MyS  is  non  decreasing  as  more  observations  (rows)  are  added  to  the  matrix 

(S,V). 

Proof:    Write  S  =   (S[,S'2)'  and  V  =   (V{,Vf)'.    We  need  to  show  that  for  an 
arbitrary  vector  a  (having  the  same  dimension  as  the  number  of  rows  of  S  and  V) 

a'S'MvSa>a'S[MVlS1a. 

Note  that  a'S'MySa  is  the  sum  of  squares  of  the  residuals  from  a  projection  of  Set 
on  the  space  spanned  by  V.  Similarly,  q'^JMv^iq:  is  the  sum  of  squared  residuals 
from  a  projection  of  o:5i  on  V\.  The  inequality  is  verified  using  the  fact  that  the  sum 
of  squared  residuals  is  non-decreasing  as  the  number  of  observations  increases  (here 
the  number  of  rows  of  S\  and  S).  See,  e.g.,  Brown,  Durbin  and  Evans  (1975).D 

Lemma  A. 2    Under  assumption  Al, 

fx'M-zxy1    „  .. 

ris.uLHH  =0p(1)- 

where  the  swpremum  with  respect  to  (Ti,...,Tm)  is  taken  over  all  possible  partitions 
such  that  |T,_i  —  Ti|  >  q  (i  =  l,...,m  +  1);  the  matrix  Z  diagonally  partitions  Z  at 

(7\,...,:rm). 
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Proof:  We  have  the  identity  X'MjX  =  X[MZlXx  +  ■■■  +  X'm+1MZm+lXm+1.  An 
m-break  model  has  m  +  1  regimes.  Each  partition  (7\,  ...,Tm)  leaves  at  least  one  true 
regime  uncut.  In  other  words,  there  exists  an  i  such  that  (X,-,Z,)  contains  (Xf,Zf) 
as  a  sub  matrix.  By  Lemma  A.l,  X'iMZtXl  >  Xf'M^Xf.  Hence  {X'MjX/T)-1  < 
(Xf'MzoXf/T)-1.  This  implies  \\(X'MzX/T)-l\\  <  max,-  \\{X? MzoX? IT)-l\\  for  all 
partitions.  The  lemma  now  follows  from  assumption  Al.  D. 

It  should  be  pointed  out  that  (Z  MxZ/T)~l  is  not  uniformly  stochastically  bounded 
over  all  possible  partitions.  In  fact,  it  is  easy  to  see  that  this  matrix  is  Op(T)  for  some 
partitions. 

Lemma  A. 3   Under  assumption  Al, 

sup    X'MzZ0  =  0P(T). 

Ti,...,Tm 

Proof:  Because  M-%  is  a  projection  matrix,  we  have  \\X'M-zZ0\\  <  \\X\\\\M-^Zo\\  < 
||X||||Zo||  uniformly  over  all  partitions.  The  lemma  follows  from  ||X||  <  Jtr(X'X)  = 
Op{VT)  and  similarly  \\Z0\\  =  Op(y/T).  D 

Lemma  A. 4   The  following  identity  holds: 

(z'Mxzy1  =  (z'zy1  +  (z'z)-1(z'x)(x'MzX)-1x'z(z'z)-\ 

Proof:  Follows  from  direct  verification.  □. 

Lemma  A. 5   Under  assumption  A4,  there  exists  a  <  1/2  such  that 

sup    \\Z(Z'Z)-lZ'U\\  =  0P(T°), 

T\  ,...,Tm 

where  the  supremum  with  respect  to  (Ti,...,Tm)  is  taken  over  all  possible  partitions 
such  that  \Ti-i  —  T,-|  >  q  (i  =  l,...,m  +  l)  under  assumption  A^(i)  and  over  partitions 
such  that  \Ti-i  —  Ti\  >  tT  for  some  e  >  0  under  assumption  A^(ii). 

Proof:  Consider  first  the  case  where  part  (i)  of  assumption  A4  is  assumed  to  hold. 
Because  of  the  independence  assumption  between  zs  and  ut,  we  can  treat  the  zt's  as 
nonstochastic,  otherwise  conditional  arguments  can  be  used.  Let  Pj  =  %(%  Z)~lZ  . 
We  shall  prove  that  \U'PzU\  =  0p(T2a)  uniformly  in  Tu...,Tm.  However,  U'PjU  is 
the  summation  of  the  m  +  1  terms 

Ti+i  Ti+i  Ti+i 

( 5Z  2<u*)'(  J2  ^zt)-1(  X)  0«u«)'    (»'  =  °i  -i m) 

Ti+\  T.+l  T.+l 
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Thus  it  suffices  to  prove  that 
(31) 


^p  ii  £6ii =o,(n 

l<k<l<T     t=k 


with  £  —  k  >  q  and  where  £t  is  a  q  x  1  vector  defined  by  £t  =  £t(k,£)  =  {Akt)~l^2ztut 
with  Akt  =  ]£j=jfc  -Zj-zj.  For  notational  simplicity,  the  dependence  of  £t  on  k  and  ^  will 
not  be  exhibited.  Now 


(32) 


p[  suP   ||E6II>^)  <  £  £  p(\\J:^\>tA 


fc=l t=k+q 

T         T 

<  r-2asj2  £  £ 


t=Jfc 


By  the  mixingale  property,  we  can  write  ut  as 

oo 

ut  =    £  ui<>    with  Ujt  =  £(ut|^_,-)-.E(ut|.7r,t_j_1) 


J  =  — oo 


and  for  each  j,  {tiJt,  -^t— j}  is  a  sequence  of  martingale  differences.  Using  this  decom- 
position, we  have 

t  oo      e 

£6  =  £  £&. 

t=Jfc  j=-oo  t=i 

where  £J(  =  (A^)-ly,22(Ujf  By  Minkowski's  inequality, 


(33) 


* 

Is 

(    ~ 

e 

2s" 

£6 

< 

£ 

E 

£6. 

r=Jfc 

.  j=-oo 

t-k 

- 

l/2sv 


2s 


A  key  point  is  that  for  fixed  j,  k,  and  £,  {Cjt,^t-j}  (t  =  k,...,£)  form  a  sequence  of 
martingale  differences.  Thus  by  Burkholder's  inequality  (Hall  and  Heyde  1981,  p. 23) 
there  exists  a  C  >  0,  only  depending  on  q  and  5,  such  that 


(34) 


t=k 


2s 


< 


^(£n^ir)  <  c  (£(£iiu2s)i/s) 


where  the  second  step  follows  by  Minkowski's  inequality.  Now  ||£jt||2  =  z't(Ake)  1ztu2t. 
Thus  {E\\£itf)11'  =  z't{Akl)-lzt(E\ujt\2syla.  By  A4(a),  for  r  =  2s,  we  can  show  (see 
Hansen  1991) 

(E\ujt\2sY/2s  <  2cti}>j  <  2(max  c,)Vj  <  Ki/tj     for  all  j. 
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It  follows  that  (£||^||2s)1/s  <  z't(Akt)-lztK2rl,).  Thus  from  (34), 


(35) 


E 


£6 

t=k 


2s 


<  C  (£z't(Akt)-lztK^    =  C(K^)2Y 


where  we  have  used  the  fact  that  J2t=k  2t(^«)   lzt  =  trace((Akt)~l  Y?k  ztz't)  =  trace(I) 
q.  Combining  (35)  and  (33),  we  have 


E 


£6 

t=k 


2s 


2s 


<  Cq'K2'  (    £   +*)      <  ~. 


U=-oo 


Note  that  the  right  hand  side  above  does  not  depend  on  k  and  L  This  implies,  in  view 
of  (32),  that  with  I  -  Jb  >  q, 


p{  sup   HE^r*)  kct-2***2 

\\<k<t<r    t=k  J 


for  some  C\  >  0.  Let  s  =  2  +  8/2  (the  moment  of  order  4+6  of  ut  exists  by  assumption 
A4),  we  can  choose  a  G  (0, 1/2)  such  that  r_2ari+2  ->  0.  This  proves  (31)  and  hence 
the  lemma. 

Consider  now  the  case  where  part  (ii)  of  assumption  A4  is  assumed  to  hold.    In 
this  case 

trp     1 

T-1     £     ztz't ->  Q(v) -Q(u), 

t=[Tu]+l 

and  hence  (T_1  Z^=pyi+1  ztz't)~X  — *  {Q(v)  —  Q(u))~l  uniformly  in  v  and  u  such  that 
v  —  u  >  t  >  0.  Also, 

[Tv] 

T-1'2     J2     *tut  =  Op(l) 

t=[Tu)+l 

uniformly  using  a  functional  central  limit  theorem  for  martingales  differences.  Ac- 
cordingly, \U'P^U\  =  Ov{\)  uniformly  in  7\,...,rm  and  the  statement  of  the  lemma 
holds  with  a  =  0.       □ 

Lemma  A. 6   Under  assumptions  A1-A4,  we  have  for  some  a  <  1/2, 

(36)  sup    X'Z(Z'Z)-lZ'U  =  Op(TQ+1/2). 

Ti Tm 

Proof:  This  follows  from  Lemma  A.5,  \\X\\  =  Op{T^2)  and  \\X'Z(Z'Z)-1Z'U\\  < 
\\X\\\\Z(Z'Z)-^U\\.     a 
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Note  that  the  same  argument  leads  to 

(37)  sup    Z'0Z(Z'Z)-1ZU'  =  Op(Ta+1/2). 

Tj rm 

Proof  of  Lemma  1:  By  the  definition  of  dt, 

T 

£  utdt  =  U'X0  -  0°)  +  U'TS  -  U'Z0S° 
i 

where  T  diagonally  partitions  Z  at  (fi,...,fm).  Because  U'X/3°  =  0P(T1/2)  and 
U'Z0  =  0P{T1/2)  (these  terms  do  not  depend  on  (T\,  ...,Tm)),  to  prove  the  lemma,  it 
suffices  to  show  T~^U'X$  =  op(l)  and  T-1£/'Zo  =  op(l).  We  shall  prove  a  stronger 
result.  Let  (Ti,  ...,Tm)  be  an  arbitrary  partition  and  Z  be  the  associated  diagonal 
partition  of  Z.  Also  let  $({Tj})  and  S({Tj})  be,  respectively,  the  estimates  of  f3  and 
S  corresponding  to  this  same  partition.  We  shall  prove 

(38)  sup    hrxfaTi))  =  op(l), 

Tx Tm  1 

(39)  sup    hrZt({Ts})  =  op(l), 

where  the  supremum  with  respect  to  (Ti,...,Tm)  is  taken  over  the  same  partitions  as 
those  in  Lemma  5.  First  consider  (38).  The  estimator  0({Tj})  can  be  written  as 

fr{{Tj})    =    (X'MjXy'X'MjY 

(40)  =    (X'MjX^X'MjZoS0  +  [X'MjX^X'MjU. 

Using  the  argument  of  Lemma  A. 3,  we  deduce  that  X'M-gU  =  Op(T).  This  together 
with  lemmas  A. 2  and  A. 3  implies  that  $({Tj})  =  0P(1)  uniformly  over  all  partitions. 
Hence  T^U'X^Tj})  =  Op(T"1/2)  uniformly  over  all  partitions,  obtaining  (38). 

Next,  consider  (39).     From  6({Tj})  =  (Z'MxZ)-1!? MXY  and  MXX  =  0,  we 
obtain 

U'ZS({Tj})    =    U'ZiTMxZy'Z'MxZoS0 

(41)  +U'Z(Z'MxZ)-1Z'MxU 

=    (/)  +  (//)■ 

B}?  Lemma  A.4, 

(/)    =    U'Z(Z'Z)-1Z'MxZoS0 

(42)  +U'Z(Z'Z)-1Z'X(X'MzX)-1X'PIMxZQS0. 
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Because  P^  and  Mx  are  projection  matrices,  ||M^Z0||  <  ||Z0||  =  Ov{Tll2)  and 
\\X'PjMxZQ\\  <  \\X\\\\Z0\\  =  0P{T).  Now,  we  have  (/)  =  0p(Ta+1/2)  uniformly 
over  all  partitions  using  Lemmas  A. 2,  A.5,  and  A.6. 

It  remains  to  derive  a  bound  for  (//).  Note  that  (I)  and  (II)  differ  only  in 
that  ZoS°  is  replaced  by  U.  Similar  argument  shows  that  (77)  is  of  a  lower  order 
of  magnitude,  more  specifically  (II)  =  0p(T2a)  (the  details  are  omitted  to  avoid 
repetition).  Because  2a  <  a  +  1/2,  we  have  U'ZSdT,})  =  0p(Ta+1/2).  Equivalently, 
T-lU'Z8{{Tj})  =  0p{Ta-i'2)  =  op(l)  uniformly  over  all  partitions.  This  proves  (39) 
and,  hence,  the  proof  of  Lemma  1  is  complete.        □ 

Proof  of  Lemma  2:  If  there  exists  a  break,  say,  A° ,  which  cannot  be  consistently 
estimated,  then  with  some  positive  probability  to  >  0  there  exists  a  positive  number 
7]  >  0  such  that  no  estimated  break  falls  in  the  interval  [T(X°  —  7/),T(A°  +  rj)]  for  a 
subsequence  of  T  (without  loss  of  generality,  assume  this  subsequence  is  the  same  as 
T).  Suppose  this  interval  is  classified  into  the  fc-th  regime,  namely,  Tk-i  <  T(\°  —  rj) 
and  T(A°  +  77)  <  fk.  Then  dt  =  x't0  -  p°)  +  z't(6k  -  <5°)  for  t  €  [T(X°  -  77),TA°]  and 
dt  =  x\0  -  0°)  +  z[(8k  -  8j+1)  for  t  e  [TA?  +  1,  T(AJ  + »/)].  We  have 


[      ]  '        *    *        <:"    l    l    EiVt     Ziztz't 

2^2  XtXt      2J2  xtzt 

Y.2  zt2t       E2  ztzt    )    \   8k  —  <$j+i 

where  J2i  extends  over  the  set  T(A°  —  77)  <  t  <  TA°  and  £2  extends  over  the  set 
TA°  +  1  <  t  <  T(\®  +  7;).  Let  77-  be  the  smallest  eigenvalue  of  the  first  matrix  in  (43) 
and  7^.  be  the  smallest  eigenvalue  of  the  second  matrix  in  (43).  Then 


j2d2+j:d2>1T[0-po\\2+¥k-^\\2}+fT\0-n2 +1&-&1I1 
1      2 

>  min{7r,7T}  (114  -  <?||2  +  \\6k  -  6?+1||2)  >  \  min{7r,7f  }||«J  -  <?+1||2. 

The  last  inequality  follows  from  (x— a)'A(x— a)  +  (x  —  b)'A(x  —  b)  >  (l/2)(a  —  b)'A(a  —  b) 
for  an  arbitrary  positive  definite  matrix  A  and  for  all  x.  Now  the  first  matrix  in 
(43)  can  be  written  as  (Tti)y-J2t(x<>_  \wtw't  =  (Tt])At,  say.  By  assumption  A2,  the 
smallest  eigenvalue  of  At  is  bounded  away  from  zero.  Thus  the  smallest  eigenvalue 
of  (Ttj)At,  7t,  is  of  the  order  Tt).  The  same  can  be  said  for  jT.  Therefore,  Yli  d2  > 
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TriCiWq  ~  ^°+ill2  =  TC\\#j  -  <5?+1||2  for  some  C  =  Vd  >  0  with  probability  no  less 
than  eo  >  0.  □ 

Proof  of  Proposition  2:  Because  A*  (k  =  l,...,m)  is  consistent  by  Proposition 
1,  with  large  probability,  6k  is  estimated  using  at  least  a  positive  fraction  of  the 
observations  from  [T°_j,  T°],  say  using  t  €  [r(A°_!  +  e),  T(\°h  —  e)].  Over  this  interval, 
dt  =  xt0  -  0°)  +  z't(6k  ~  $)•  Hence  £f  d?  >  £.  <%  with  £.  extending  over  the  same 
interval.  If  either  {3  or  6k  is  not  consistent,  then,  with  some  positive  probability,  either 
||/?  —  (3\\  >  a  or  \\6k  —  6°\\  >  a  will  be  true  for  some  a  >  0.  Similar  argument  as  in  the 
proof  of  proposition  1  leads  to  E«  ^?  >  ^(A"  —  A°_j  —  2e)a2C  for  some  C  >  0,  with 
some  positive  probability.  This  again  gives  rise  to  a  contradiction  with  (4)  in  view  of 
(5)  and  Lemma  1.        □ 

Proof  of  Proposition  3:  Without  loss  of  generality,  we  assume  there  are  only 
three  breaks  (m  =  3)  and  provide  an  explicit  proof  of  T-consistency  for  A2  only.  The 
analysis  for  Ai  and  A3  is  virtually  the  same  (and  actually  simpler)  and  is  thus  omitted. 

By  the  consistency  result  of  proposition  1,  for  each  e  >  0  and  T  large,  we  have 
\Tk  —  T°\  <  eT,  with  high  probability.  Therefore  we  only  need  to  examine  the  behavior 
of  the  sum  of  squared  residuals,  5x(Ti,T2,T3),  for  those  T{  that  are  close  to  the  true 
breaks  such  that  \T{  —  Jf\  <  tT  for  all  i.  Also  using  an  argument  of  symmetry,  we 
can,  without  loss  of  generality,  consider  the  case  T2  <  T°.  For  C  >  0,  define 

Tt(C)  =  {(Tx.Ta.Ta);  \Tt  -T°\<cT,l<i<  3,T2  -  T2°  <  -C). 

Because  St(Ti,T2,  Z3)  <  5r(r1,T23,T3)  with  probability  1,  to  prove  the  proposition  it 
is  enough  to  show  that  for  each  77  >  0,  there  exist  C  >  0  and  e  >  0  such  that  for  large 

(44)  p(min{5T(r1,ra,r3)  -  SrCri,r20,r3)}  <  o)  <  v, 

where  the  minimum  is  taken  over  the  set  Tt(C).  Such  a  relation  would  imply  that 
for  a  large  C,  global  optimization  cannot  be  achieved  on  TC(C).  Thus  with  large 
probability,  T2  —  T°|  <  C.  Now  denote 

SSRx  =  St(TuT2,T3), 

ssr.2  =  St(Ti,t2  ,  r3), 

and  introduce 

SSR3  =  St(Ti,  T2,  T2 ,  T3). 


38 


By  definition,  we  have 

(45)       ST(TUT2, T3)  -  Sr(rl5 7* T3)  =  (SSRr  -  SSR3)  -  (SSR2  -  SSR3). 

This  latter  relation  is  useful  because  it  allows  us  to  carry  the  analysis  in  terms  of  two 
problems  involving  a  single  structural  change.  Indeed,  note  that  SSR\  —  SSR3  is  the 
difference  in  the  sums  of  squared  residuals  allowing  an  additional  fourth  break  at  time 
T°  between  the  breaks  T2  and  T3.  Similarly,  SSR2  —  SSR3  is  the  difference  in  the 
sums  of  squared  residuals  allowing  an  additional  fourth  break  at  time  T2  between  the 
breaks  7\  and  T°.  Hence,  in  each  case  SSR\  and  SSR2  can  be  viewed  as  the  sum  of 
squared  residuals  from  a  constrained  version  of  a  more  general  model  whose  sum  of 
squared  residuals  is  SSR3. 

It  is  then  easy  to  derive  exact  expressions  for  the  two  components  on  the  right 
hand  side  of  (45)  in  terms  of  estimated  coefficients.  Consider  first  SSR\  —  SSR3,  we 
have  (e.g.,  Amemiya  1985,  p.  31): 

sT(Tu  r2,  t3)  -  st(tu  t2,  t°,  r3)  =  (s;  -  6AyzAMwzA(s;  -  8A), 

where  W  =  (X,Z),  with  Z  the  diagonal  partition  of  Z  at  (Ti,T2,T3),  63  is  the  vector 
of  estimated  coefficients  associated  with  the  regressors  (0,  ...,0,zro+1,  ...,zy3,0,  ...,0)', 
and  6A  is  the  vector  of  estimated  coefficients  associated  with  the  regressors  ZA  = 
(0, ...,  0,  zt7+\  , ...,  2r°,  0, ...,  0)'  (see  Figure  3).  Similarly,  we  have  for  SSR2  —  SSR3: 

st(tut*,t3)  -  s,r(r1,ra,3*,r3)  =  (s2  -  sAyz'AMwzA(s2  -  sA), 

where  W  =  (X,  Z)  with  Z  the  diagonal  partition  of  Z  at  (Ti,  T°,  T3),  and  82  is  the  vec- 
tor of  estimated  coefficients  associated  with  the  regressors  (0, ...,  0,  -ZTi+i,  ••-,  2r2, 0, ...,  0)' 
(again,  see  Figure  3).  Thus 

(46)  ssRi  -  ssr2  =  (s;  -  sAyz'AMwzA(s;  -  sA)  -  {s2  -  sAyz'AMwzA{62  -  sA) 

The  second  term  on  the  right  is  bounded  by  (S2  —  6A)'ZAZA(62  —  SA)  because  Z'AM^rZA 
<  Z'AZA,  Mjy  being  a  projection  matrix.  Expanding  the  first  term  on  the  right  hand 
side  of  (46),  we  have 

SSRi  —  SSR2 

(47)  >  (s;  -  sAyz'AzA(6-3  -  sA)  -  (s;  -  sA)zAW(W'W)-lW'zA(6'3  -  sA) 

-(82  -  8AyZAZA(62  -  SA)  =  (I)  -  (II)  -  (HI). 
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Consider  the  limiting  behavior  of  term  (I).  Note  first  that  the  estimates  £3  will 
be  close  to  8%  given  that,  on  the  set  Tt{C),  the  distance  between  T,  and  Tf  can  be 
controlled  and  made  small  by  choosing  a  small  e.  Noting  that  8A  is  estimated  using 
observations  from  the  second  true  regime  only,  8A  is  close  to  8%  for  a  large  enough 
C,  on  Tt{C).  Hence,  for  large  C,  large  T  and  small  e,  expression  (I)  is  no  less  than 
(l/2)(6°  -  8°)'Z'AZA{8%  -  6°)  with  large  probability. 

Next  consider  term  (II).  By  the  strong  law  of  large  numbers,  it  is  easy  to  argue 
that  on  TC(C),  8*  and  8A  are  0„(1)  uniformly.  Also  on  Tt{C),  (WW/T)-1  =  0P{\) 
and  Z'AW/C  =  Op(l)  (because  Z'AW  involves  no  more  than  C  observations).  Thus 
(II)  is  no  larger  than  (l/T)C2Op(l). 

Consider  finally  (III).  Because  both  82  and  8A  are  close  to  8%,  \\82  —  8A\\  <  p  with 
large  probability  for  any  given  small  number  p  >  0  (this  is  true  for  large  T,  large  C 
and  small  e).  Thus  (III)  is  no  larger  than  pi'Z'AZ&i,  with  1  a  vector  of  l's. 

In  summary,  we  have  that  the  inequality 

(48)    SSRi  -  SSR2  >  (l/2){8°3  -  8°2)'Z'AZA{8l  -  8°2)  -  ^C20P{1)  -  pt'Z'AZAL, 

holds  with  large  probability.   The  first  term  on  the  right  hand  side  of  (48)  is  of  the 

same  order  of  magnitude  as  C  since  the  smallest  eigenvalue  of  Z'AZA/C  is  bounded 

away  from  zero  by  A2.  The  other  two  terms  are  dominated  by  the  first  term  and  thus 

with  large  probability,  SSRi  —  SSR2  >  0.  This  proves  (44)  and  thus  the  proposition. 

□ 

Proof  of  Proposition  7:  Note  that  we  can  write: 

(T-(k  +  l)q-p\SSRo-SSRk 
FT(\u.,.,Xk;q)=\^ J — ■ 

where  SSRq  and  SSRk  are  the  sum  of  squared  residuals  under  the  null  hypothesis 
and  under  the  alternative  allowing  k  breaks,  respectively.  We  have  (T  —  (k  +  l)q  — 
p)~1SSRk  — >p  a2.  Hence,  we  concentrate  on  the  limit  of  F^  =  SSRq  —  SSRk.  Now, 
let  Du(i,j)  (DR(i,j),  resp.)  be  the  sum  of  squared  residuals  from  the  unrestricted 
(restricted,  resp.)  model  using  data  from  segments  i  to  j  (inclusively),  i.e.  from 
observation  Tt_i  +  1  to  Tj  (these  notations  have  different  meanings  from  the  D(i,j) 
defined  in  Section  4.3,  where  i  and  j  refer  to  the  numbering  of  the  observations  not 
the  numbering  of  segments).  We  can  then  write: 

Jt+i 
F}  =  DR(l,k  +  l)-1£Du(i,i), 

«=i 
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or 

(^)F^=J^[DR(l,i  +  l)-DR(l,i)-Du(i  +  l,i  +  l)]+DR(l,l)-nu(l,l). 

Consider  first  the  estimate  of  the  coefficients  on  the  x's.  Let  f3u  and  J3R  be  the 
estimate  of  /?  in  the  unrestricted  and  restricted  models,  respectively.  We  have  J3U  = 
{X'M-zX)-lX'MzY  and  0R  =  {X'MzX)-lX'MzY  where  Z  =  (z[,...,zT)'.  We  need 
to  introduce  further  notations.  Let  Yij,  Uij,  Xij  and  Zij  denote  the  corresponding 
vectors  or  matrices  containing  elements  belonging  to  the  partition  from  segment  1 
to  segment  j  (inclusively).  Also,  let  Yj,  Uj,  Xj  and  Zj  be  the  vectors  or  matrices 
containing  elements  from  segment  j  only.  Now,  let  8Rj  be  the  estimate  of  8  using  data 
on  the  z's  from  segments  1  to  j  in  the  restricted  model.  Also,  let  6f  be  the  estimate 
of  6j  using  data  on  the  z's  from  segment  j  only  in  the  unrestricted  model.  We  have 

%  =  (zfrr'zM-x^). 

Using  the  fact  that,  under  the  null  hypothesis,  Y  =  Xfi  +  Z6  +  U  =  X/3  +  ~Z8  +  U 
and  Yj  =  Xj/3  +  ZjS  +  Uj  (with  8  =  (£', ... ,8')'  a  (k  +  1)  vector  with  8  defined  by 
8i  =  82  =  ...  =  8k+i  =  8),  straightforward  algebra  yields 

DR(l,j)    =    \\  (I  -  Pz^K^j  -  XhJAT)  \\\ 
Du(j,j)    =    \\(I-PZj)(Uj-X,At)\\\ 
where 

AT    =    {X'MzX)-lX'MzU, 
AT    =    (X'MzX)-xX'M-zU. 

Consider  the  ith  element  in  the  summation  defining  FT  in  (49),  we  have 

FT,t    =    DR(l,i  +  l)-DR(l,i)-Du(i  +  l,i  +  l) 

=    \\(I-  P*Ml )(Ui,i+i  -  Xu+1At)  ||2  -  ||  (/  -  PZl„)(Uu  -  XUAT)  ||2 
-  \\  (I  -  Pzi+1)(Ui+1  -  Xi+1AT)  f 

To  simplify  the  exposition,  we  introduce  the  notation  Sj  =  Z[jUij,  Hj  =  Z[jZ\ij, 
Kj  =  ZJj-Xij,  Lj  =  XljXu,  and  Mj  =  X'ltjUh:  .  Noting  that 

U'u+iuu+i  =  u'uuu  +  u:+1ui+l, 

^i,t+i-^i,t+i    =    UltiXij  +  Ui+iXi+i, 
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we  deduce  that 


FTfi    =    -%+1H£1S*l  +  S'iHr1Si  +  (Si+i-Siy[Hi+1-Hi]-1(Si+l-Si) 
+2S't+1H-+\Kt+1AT  -  2S'iHr1KiAT 
-2(S,-+i  -  Si)'[Hi+l  -  Hi)-\Ki+l  -  Ki)AT 
+2(Mi+1  -  Mi)'(AT  -  AT)  +  (AT  -  AT)'(Li+1  -  Li)(AT  -  AT). 

Using  the  stated  assumptions,  we  have  the  following  basic  convergence  results: 

1)  T-V'iXu^jyUu  =►  a(B1(Xj),B2(Xj))'  =  aB{Xj)'  where  B(r)  is  a  (q  +  p) 
dimensional  vector  Brownian  motion  with  covariance  matrix 


Q  = 


Qn   Q12 

Q21   Q22 


2)  T-\XU,Z^)'{XX„ZU)  -t'otXjQ. 
From  these  two  limits,  we  deduce  easily  the  following  results 

a)  T-^Sj  =►  aB2(Xj); 

b)  T-xHj  ->p  a2XjQ22; 
c)r-1/^-."<r2Aig21; 

d)  T-'Lj  ->»  o*\jQu; 

e)  T-x!2Mj  =>  vB^Xj); 

0 

T1,2AT    =     {T-lX'X-T-xX'Z(T-lZ'Z)-lT-lZX)-1 

x(T~1/2X'U  -  T-xX'Z{T-lZ'Z)-xT-ll2ZlU) 
=»    <r_1(Qn  -  Qi2Q22Q2i)-\Bi(\)  -  Q12Q^B2(l))  =  A'. 


It  remains  to  consider  the  limit  of  Tx'2At-   Let  A  =  diag{Xi,X2 
(k  +  1)  by  (k  +  1)  diagonal  matrix.  We  deduce  that 

i)  r-^'z  -»*  a2(A  ®  g22); 

ii)  T~XX'Z  ->p  a2(e'A  <g>  Q12)  where  e'  =  (1,1, ...,  1),  a  (Jb  +  1)  vector; 
iii)  T-1/2^  =>  a(B2(Ax),  B2(A2  -  Ax), ...,  B2{1  -  A*))'  =  5*. 
We  then  obtain 


Ai,...,l  -  Xk},  a 


Tl'2AT    = 


[T-XX'X  -  T-lX'Z(T-l~Z'Z)-xT-lZX)-x 
x[T~1/2X'U  -  T-1X'Z(T-1Z'Z)-1T-1/2Z'U] 
(T^iQii  -  (e'A  ®  Q12)(A  <g>  g22)_1(Ae  ®  Q^)]"1 
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x[Bj(l)  -  (e'A  <g>  Q12)(A  ®  g22)_15'] 
=     ^_1[<3n  -  (e'A*  ®  Qi2Q2-21Q21)]-1[5i(l)  -  (e'  ®  QuQvW] 
=    ff-MQn  "  QuQ^Qn]-l[B!(l)  -  Ql2Q22lB2(l)}  =  A'. 

The  second  equality  follows  since  e'Ae  =  1  and  (e'  ®  Q\2Q22)B"  =  QwQm-^MI)- 
Using  the  results  stated  above  we  easily  deduce  that 

(Mt+1  -  Mi)\AT  -  AT)  =»  0, 

(AT  -  Ar)'(X,+i  -  ^)(AT  -  Ar)  =»  0, 

SJ+j^rji^+xilr  -  S^H-'KiAr  -  (Si+1  -  Si)'[Hi+i  -  ff.r^+i  -  K{)At 
=>  <tB2(\,+1)Q^Q21A'  -  aB2(\i)Q^Q21A' 
-a(B2(Xi+1)  -  B2(\i))Q£Q21A*  =  0. 

Hence,  we  are  left  with 

Ft,  =  -S'l+1H-+\Si+1  +  S'.H-'Si  +,(5,+1  -  Si)'[Hi+1  -  ff.-j^S+i  -  £)  +  op(l), 

and  we  deduce,  using  the  fact  that  B2(\j)  =  crQ22  W(\j)  with  W(\j)  a  vector  of  q 
independent  standard  Wiener  processes, 

FT,    =*>    -B2(Xi+iyQ^B2(Xi+1y\i+1  +  B2(Xi)'Q221B2(Xi)/Xt 

+(£2(A,+1)  -  JB2(A,))'g2-21(52(A,+1)  -  B2(A,))/(At+1  -  A,) 
=     -a2  ||  W(A,+1)  ||2  /A,+1  +  a2  ||  W(A;)  ||2  /A,- 

+(72||W(Al+1)-W(Ai)||2/(A,+i-A,) 
=    <r2  ||  W(Ai+1)  -  A,+1^(A,)  ||2  /Al+1A,(At+1  -  A,). 

Finally,  note  that 

DR(l,l)-Du(l,l)     =     \\  (I -  P^Ur  -  X,AT)  \\2  -  \\  (I  -  Pz^U,  -  X.At)  \\2 

=    2U[(I-PZl)X1(AT-AT) 

+ATX[(I  -  PZl)XxAT  -  ATX[(I  -  PZl)X1AT 

=►    0. 

Hence 

f:  -  a2E "  w(A'"+i)  -  *sagw  n2, 

,=i  A,-+iA;(A,+1  —  A,) 
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and  the  result  of  Proposition  7  follows.        □ 

Proof  of  Proposition  9:  Let  ct,i  =  c(ar,£,T)  be  the  critical  value  of  the  test 
Ft(£  +  1\£)  corresponding  to  a  given  size  aj.  By  definition, 

(50)  P(Ft(£  +  1\£)  >  cT<l\  conditional  on  £  breaks)  <  aT, 

for  all  £.  Now  suppose  the  true  number  of  breaks  is  m0.  By  the  consistency  of  the 
sequential  test,  we  have 

(51)  P(Ft(£  +  l\£)  >  ct,i\  conditional  on  m0  breaks]  — ►  1,     for  £  =  0, 1,  ...,m0  —  1, 

since  ct,(  increases  slowly  enough  given  the  assumption  on  the  rate  of  decrease  of  ax- 
Let  m  be  the  number  of  breaks  estimated  by  the  sequential  procedure.  The  event 
{m  <  77io}  satisfies 

{m  <  m0}  C  Ug^Ar,*, 

where  Aj,k  =  {Fr{k  +  l\k)  <  or,k)-  That  is,  in  order  to  obtain  m  <  m0,  it  must  be 
the  case  that  the  hypothesis  of  k  against  k  +  1  breaks  cannot  be  rejected  for  some 
k  <  tuq.  By  (51),  P(AT,k\Tno)  —*  0  as  T  — »•  oo  for  each  k  <  m0.  It  follows  that 

mo— 1 

(52)  P(rh  <mQ)  <    j^  P(ATtk\m0)  -»■  0,  as  T  -*  oo. 

k=o 

Next,  consider  the  event  {m  >  mo}.  In  order  to  obtain  m  >  mo,  it  must  be  the 
case  that  the  hypothesis  of  mo  breaks  against  m0  +  1  breaks  is  rejected.  This  happens, 
in  large  samples,  with  probability  no  more  than  or,  given  m0  breaks.    Formally,  by 

(50)  with  £  =  m0, 

(53)  P(rh  >  m0)  <  P(FT{m0  +  l|m0)  >  cT,mo  \m0j  <  aT. 

Combining  (52)  and  (53),  we  see  that  P(m  ^  m0)  — ♦  0  as  T  —*  oo.  That  is,  the 
estimate  of  the  number  of  breaks  obtained  using  the  sequential  test  is  consistent.       □ 
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B      Computational  Appendix. 

In  this  section,  we  discuss  an  algorithm  based  on  the  principle  of  dynamic  programming 
that  allows  the  computation  of  estimates  of  the  break  points  as  global  minimizers  of 
the  sum  of  squared  residuals.  The  method  is  directly  applicable  to  the  case  of  a  pure 
structural  change  model.  Useful  references  include  Guthery  (1974),  Bellman  and  Roth 
(1969)  and  Fisher  (1958). 

A  standard  grid  search  procedure  to  obtain  global  minimizers  with  m  breaks  would 
require  least  squares  operations  of  order  0(Tm).  The  dynamic  programming  approach 
provides  an  efficient  method  that  requires  least  squares  operations  of  order  0(T2)  at 
most  for  any  number  of  breaks;  hence  substantial  savings  in  computations  can  be 
achieved  when  estimating  a  model  with  more  than  two  breaks  (the  method  also  allows 
some  savings  in  the  case  of  two  breaks).  The  basic  reason  for  this  possible  reduction  in 
computation  is  fairly  intuitive  once  it  is  realized  that,  with  a  sample  of  size  T,  the  total 
number  of  possible  segments  is  at  most'T(T-f  l)/2  and  is  therefore  of  order  0(T2).  The 
dynamic  programming  algorithm  can  be  seen  as  an  efficient  way  to  compare  possible 
combinations  of  these  partitions  to  achieve  a  minimum  sum  of  squared  residuals. 

In  practice,  less  than  T(T  +  l)/2  segments  are  permissible.  First,  some  minimum 
distance  between  each  break  may  be  imposed,  as  is  done  in  the  construction  of  the 
tests  discussed  in  Section  4.  Let  this  minimum  distance  be  denoted  by  h.  Note  that 
h  <  q  is  possible  in  which  case  the  sum  of  squared  residuals  is  zero;  for  simplicity  we 
suppose  without  loss  of  generality  that  h  >  q.  This  implies  a  reduction  in  the  number 
of  segments  to  be  considered  of  (h  —  1)T  —  (h  —  2)(h  —  l)/2.  Now  the  largest  possible 
segment  must  be  such  as  to  allow  m  other  segments  before  or  after.  For  example, 
when  the  segment  starts  at  a  date  between  1  and  h  —  1,  the  maximal  length  of  this 
segment  is  T  —  hm  when  m  breaks  are  allowed  (i.e.,  m  +  1  regimes).  This  allows  a 
further  reduction  in  the  total  number  of  segments  considered  of  h2m(rn  +  l)/2  —  mh  . 
Hence  all  the  relevant  information  can  be  obtained  from  the  examination  of  the  sums 
of  squared  residuals  associated  with 

T(T  +  l)/2  -  (h  -  l)T  +  (h-  2){h  -  l)/2  -  h2m(m  +  l)/2  +  mh 

segments.  We  therefore  need  to  evaluate  the  sum  of  squared  residuals  associated  with 
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segments  having  the  following  starting  and  ending  dates: 

starting  date  ending  date 

i  =  1, ...,  hm  —  1  j  =  h  +  i  —  1, ...,  T  —  hm 

i=£h,...,(£+A)h-l  j  =  h  +  i-l,...,T-{m-£)h  (£  =  l,...,m-  1) 

i  =  hm  +  2,...,T-  hm  +  1    j  =  h  +  i-  1,...,T 

This  can  be  achieved  using  standard  updating  formulae  to  calculate  recursive  resid- 
uals. Indeed,  all  the  relevant  information  can  be  calculated  from  T  —  hm  +  1  sets  of 
recursive  residuals  and  the  fact  that  the  sum  of  squared  residuals  using,  say,  t  obser- 
vations is  the  sum  of  squared  residuals  using  t  —  1  observations  plus  the  square  of  the 
recursive  residual  at  time  t.  Hence,  the  number  of  matrix  inversions  needed  is  simply 
of  an  order  0(T).  To  be  precise,  let  v(i,j)  be  the  recursive  residual  at  time  j  obtained 
using  a  sample  that  starts  at  date  i,  and  let  SSR(i,j)  be  the  sum  of  squared  residuals 
obtained  by  applying  least-squares  to  a  segment  that  starts  at  date  i  and  ends  at  date 
j.  We  note  the  following  recursive  relation  (e.g.,  Brown,  Durbin  and  Evans  (1975)): 

SSR(i,j)  =  SSR(i,j  -  1)  +  v(i,j)2. 

All  the  relevant  information  is  contained  in  the  values  SSR(i,j)  for  the  combinations 
(i,j)  indicated  above. 

Once  the  sums  of  squared  residuals  of  the  relevant  segments  have  been  computed 
and  stored,  a  dynamic  programming  approach  can  be  used  to  evaluate  which  partition 
achieves  a  global  minimization  of  the  overall  sum  of  squared  residuals.  This  method 
essentially  proceeds  via  a  sequential  examination  of  optimal  one-break  (or  two  seg- 
ments) partitions.  Let  S S R({TTtn})  denote  the  sum  of  squared  residuals  associated 
with  the  optimal  partition  containing  r  breaks  using  the  first  n  observations.  The 
optimal  partition  can  be  obtained  solving  the  following  recursive  problem: 

(54)  SSR({Tm,T})=      min     [SSR({Tm.ltj})  +  SSR(j  +  1,T)]. 

mh<]<T— h 

It  is  instructive  to  write  (54)  in  the  following  way: 

SSR({Tm,T})  = 

minmA<i1<T-/l[5'5,i?(ji  +  l,T)+ 

min{m-i)h<h<h-h[SSR(J2  +  1,  ji)+ 

m^(m-2)h<j3<h-h[SSR(J3  +  1,  j2)+ 

min*<im^-m_1_fc[55i2(l,jm)  +  SSR(jm  +  1,  jm-i )]-]]] 
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Looking  at  the  last  displayed  minimization  problem,  we  see  that  the  procedure 
starts  by  evaluating  the  optimal  one-break  partition  for  all  sub-samples  that  allow  a 
possible  break  ranging  from  observations  h  to  T  —  mh.  Hence,  the  first  step  is  to  store 
a  set  of  T  —  (m  +  l)h  +  1  optimal  one  break  partitions  along  with  their  associated 
sum  of  squared  residuals.  Each  of  the  optimal  partitions  correspond  to  subsamples 
ending  at  dates  ranging  from  2h  to  T  —  (m  —  l)h.  Consider  now  the  next  step  which 
proceeds  in  a  search  for  optimal  partitions  with  two  breaks.  Such  partitions  have 
ending  dates  ranging  from  3h  to  T  —  (m  —  2)h.  For  each  of  these  possible  ending 
dates,  the  procedure  looks  at  which  one-break  partition  can  be  inserted  to  achieve  a 
minimal  sum  of  squared  residual.  The  outcome  is  a  set  of  T  —  (m  +  1)^  +  1  optimal  two 
breaks  (or  three  segments)  partitions.  The  method  continues  sequentially  until  a  set 
of  T  —  (m  +  l)h  -\- 1  optimal  (m  —  1)  breaks  partitions  are  obtained  with  ending  dates 
ranging  from  (m  —  l)h  to  T  —  2h.  The  final  step  is  to  see  which  of  these  optimal  (m  —  1) 
breaks  partitions  yields  an  overall  minimal  sum  of  squared  residuals  when  combined 
with  an  additional  segment.  The  method  can  therefore  be  viewed  as  a  sequential 
updating  of  T  —  (m  +  l)h  +  1  segments  into  optimal  one,  two  and  up  to  m  —  1  breaks 
partitions  (or  into  two,  three  and  up  to  m  sub-segments);  the  last  step  simply  creating 
a  single  optimal  m  breaks  (or  m  +  1  segments)  partition. 

This  dynamic  programming  method  to  obtain  global  minimizers  of  the  sum  of 
squared  residuals  cannot  be  applied  directly  to  the  case  of  a  partial  structural  change 
model.  This  is  basically  due  to  the  fact  that  we  cannot  concentrate  out  the  parameters 
0  without  knowing  the  appropriate  partition,  i.e.  the  estimate  of  0  associated  with  a 
global  minimization  depend  on  the  optimal  partition  which  we  are  trying  to  obtain. 
However,  a  simple  iterative  procedure  is  available.  Let  6  =  (6, 7\, ...,  Tm),  we  can  write 
the  sum  of  squared  residuals  as  a  function  of  the  vectors  0  and  6,  i.e.  SSR(0,8).  As 
discussed  in  Sargan  (1964),  we  can  minimize  SSR(0,6)  in  an  iterative  fashion  as 
follows.  First  minimize  with  respect  to  6  keeping  0  fixed  and  then  minimize  with 
respect  to  0  keeping  6  fixed,  and  iterate.  Each  iteration  assures  a  decrease  in  the 
objective  function.  The  convergence  properties  of  this  scheme  are  discussed  in  Sargan 
(1964). 

Note  that  the  first  step,  minimizing  with  respect  to  6  keeping  0  fixed,  amounts  to 
applying  the  dynamic  programming  algorithm  discussed  above  with  yt  —  x'tf3  as  the 
dependent  variable.  Since  0  is  fixed  this  is,  indeed,  a  step  involving  a  pure  structural 
change  model.  Let  0*  =  (6*,  {T*})  be  the  associated  estimate  from  this  first  stage  (with 
{T*}  =  (rx*,  ...,7^)).   The  second  step  is  a  simple  linear  regression  with  dependent 
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variable  yt  —  z't8*  for  t  in  regime  j  (j  =  1, ...,  m  +  1)  (the  regimes  being  defined  by  the 
partition  {T*})  and  independent  variable  xt. 

An  issue  that  remains  is  the  choice  of  the  initial  value  of  the  vector  /?  to  start  the 
iteration.  We  suggest  the  following  procedure.  First  apply  the  dynamic  programming 
algorithm  treating  all  coefficients  as  subject  to  change,  i.e.  treat  the  model  as  one  of 
pure  structural  change  and  let  9a  =  (6°,  {Ta})  be  the  estimates  of  S  and  (7\,  ...,Tm) 
that  then  minimizes  the  sum  of  squared  residuals.  The  initial  value  of  the  vector 
/3  is  taken  as  the  OLS  estimate  in  a  regression  of  yt  —  z'tS^  on  xt  for  t  in  regime  j 
(j  =  1, ...,  m  + 1),  the  regimes  being  defined  by  the  partition  {Ta}.  The  reason  for  such 
a  choice  of  the  startup  value  is  that  the  estimates  of  the  break  fractions  are  consistent 
even  when  some  of  the  coefficient  of  the  parameter  vector  5  =  (<$i, ...,  8q)  do  not  change 
across  regimes  provided  at  least  one  does  change  at  each  break  date.  Hence,  this  choice 
of  the  starting  value  for  /5  is  asymptotically  equivalent  to  the  estimate  associated  with 
the  global  minimization  and  should,  accordingly,  allow  obtaining  global  minimizers 
with  respect  to  all  the  parameters  in  a  few  iterations. 
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Table  1:  Asymptotic  Critical  Values  of  the  Multiple-Break  Test. 
The  entries  are  the  quantiles  x  such  that  P(supFfc     <  x/q)  =  a. 


9 

a 

Number  of  Breaks,  k 
123456789 

DmaxF 

1 

.90 

.95 

.975 

.99 

8.02        7.87        7.07        6.61        6.14        5.74        5.40        5.09        4.81 

9.63       8.78       7.85       7.21        6.69       6.23       5.86       5.51       5.20 

11.17       9.81       8.52       7.79       7.22       6.70       6.27       5.92       5.56 

13.58      10.95       9.37       8.50       7.85       7.21        6.75       6.33       5.98 

8.78 
10.17 
11.52 
13.74 

2 

.90 

.95 

.975 

.99 

11.02      10.48       9.61        8.99       8.50       8.06       7.66       7.32       7.01 
12.89      11.60      10.46       9.71        9.12       8.65       8.19       7.79       7.46 
14.53      12.64      11.20      10.29        9.69       9.10       8.64       8.18       7.80 
16.64     13.78     12.06     11.00      10.28       9.65       9.11       8.66       8.22 

11.69 
13.27 
14.69 
16.79 

3 
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.975 

.99 

13.43     12.73     11.76     11.04     10.49     10.02       9.59       9.21       8.86 
15.37      13.84     12.64     11.83      11.15      10.61      10.14       9.71       9.32 
17.17      14.91      13.44      12.49      11.75      11.13      10.62      10.14        9.72 
19.25      16.27      14.48      13.40      12.56      11.80      11.22      10.67      10.19 
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19.38 

4 

.90 

.95 

.975 

.99 
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21.47     18.75     17.26     16.13      15.40     14.75      14.19     13.66     13.17 
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27.21      24.20     22.41      21.29      20.39      19.63      18.98      18.34      17.78 
29.60     25.66     23.44     22.22      21.22     20.40      19.66      19.03      18.46 
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27.32 
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.975 
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24.75     23.15     21.98     21.12     20.37      19.72      19.13      18.58      18.09 
27.08      24.55     23.16     22.08      21.22     20.49      19.90      19.29      18.79 
29.13     25.92     24.14     22.97     21.98     21.28     20.59      19.98      19.39 
31.66     27.42     25.13     24.01      23.06     22.18     21.35     20.63      19.94 
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33.62     29.14     26.90     25.58      24.44      23.49     22.75     22.09     21.47 

26.66 
28.75 
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Table  2:  Asymptotic  Critical  Values  of  the  Sequential  Test  Ft(£+  lK)- 
The  entries  are  the  quantiles  x  such  that  Gq>v(xY+1  =  a. 
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23.90 

24.34 

24.62 

25.14 

25.34 

25.51 

.975 

21.47 

23.34 

24.37 

25.14 

25.58 

25.79 

25.96 

26.39 

26.60 

26.84 

.99 

23.99 

25.58 

26.32 

26.84 

27.39 

27.86 

27.90 

28.32 

28.38 

28.39 

6 

.90 

19.38 

21.51 

22.81 

23.64 

24.19 

24.59 

24.86 

25.27 

25.53 

25.87 

.95 

21.59 

23.72 

24.66 

25.29 

25.89 

26.36 

26.84 

27.10 

27.26 

27.40 

.975 

23.73 

25.41 

26.37 

27.10 

27.42 

28.02 

28.39 

28.75 

29.13 

29.44 

.99 

25.95 

27.42 

28.60 

29.44 

30.18 

30.52 

30.64 

30.99 

31.25 

31.33 

7 

.90 

21.23 

23.41 

24.51 

25.07 

25.75 

26.30 

26.74 

27.06 

27.46 

27.70 

.95 

23.50 

25.17 

26.34 

27.19 

27.96 

28.25 

28.64 

28.84 

28.97 

29.14 

.975 

25.23 

27.24 

28.25 

28.84 

29.14 

29.72 

30.41 

30.76 

31.09 

31.43 

.99 

28.01 

29.14 

30.61 

31.43 

32.56 

32.75 

32.90 

33.25 

33.25 

33.85 

8 

.90 

22.92 

25.15 

26.38 

27.09 

27.77 

28.15 

28.61 

28.90 

29.19 

29.49 

.95 

25.22 

27.18 

28.21 

28.99 

29.54 

30.05 

30.45 

30.79 

31.29 

31.75 

.975 

27.21 

29.01 

30.09 

30.79 

31.80 

32.50 

32.81 

32.86 

33.20 

33.60 

.99 

29.60 

31.80 

32.84 

33.60 

34.23 

34.57 

34.75 

35.01 

35.50 

35.65 

9 

.90 

24.75 

26.99 

28.11 

29.03 

29.69 

30.18 

30.61 

30.93 

31.14 

31.46 

.95 

27.08 

29.10 

30.24 

30.99 

31.48 

32.46 

32.71 

32.89 

33.15 

33.43 

.975 

29.13 

31.04 

32.48 

32.89 

33.47 

33.98 

34.25 

34.74 

34.88 

35.07 

.99 

31.66 

33.47 

34.60 

35.07 

35.49 

37.08 

37.12 

37.23 

37.47 

37.68 

10 

.90 

26.13 

28.40 

29.68 

30.62 

31.25 

31.81 

32.37 

32.78 

33.09 

33.53 

.95 

28.49 

30.65 

31.90 

32.83 

33.57 

34.27 

34.53 

35.01 

35.33 

35.65 

.975 

30.67 

32.87 

34.27 

35.01 

35.86 

36.32 

36.65 

36.90 

37.15 

37.41 

.99 

33.62 

35.86 

36.68 

37.41 

38.20 

38.70 

38.91 

39.09 

39.11 

39.12 

Table  3:  Empirical  Results:  U.S.  Ex- Post  Real  Interest  Rate 


Specifications 

zt  =  {1} q=  1 p  =  0  h=2  M=5 

Tests1 
SupFT(l)  SupFr(2)  SupFT(3)     SupFT{4)     SupFT(5) 

58.53  44.16  53.76  51.88  44.76 


Number  of  Breaks  Selected2 

Sequential  Procedure 

2 

LWZ 

2 

BIC 

2 

Parameter  Estimates  with  Two  Breaks3 

*i 

=    1.36  (.16) 

<52 

=  -1.80  (.50) 

S3 

=   5.64  (.59) 

T, 

=    72:3    (71:2-73:4) 

T2 

=    80:3    (80:2-80:4) 

1  The  supir7'(fc)  tests  and  the  reported  standard  errors  and  confidence  inter- 
vals allow  for  the  possibility  of  serial  correlation  in  the  disturbances.  The  het- 
eroskedasticity  and  autocorrelation  consistent  covariance  matrix  is  constructed 
following  Andrews  (1991)  and  Andrews  and  Monahan  (1992)  using  a  quadratic 
kernel  with  automatic  bandwidth  selection  based  on  an  AR(1)  approximation. 
The  residuals  are  pre-whitened  using  a  VAR(l). 

2  We  use  a  1%  size  for  the  sequential  test  supFr(^  +  1|^). 

3  In  parentheses  are  the  standard  errors  (robust  to  serial  correlation)  for  £,- 
(i  =  1,2,3)  and  the  95%  confidence  intervals  for  T^  and  T2- 


Table  4:  Empirical  Results:  U.K.  CPI  Inflation  Rate  1948-1987 


zt  =  {l,yt-i} 

Specifications 
q=2                            p  =  0 

h  =  5 

M=5 

SuPFT(l) 

5.34 

SupFr(2|l) 

10.70 

Tests 
SupFr(2)                    SupFT(3) 

12.69                            12.74 

SupFT(3|2)                SupFr(4|3) 

7.47                            4.58 

SupFr(4) 

10.76 

SupFT(5|4) 

1.71 

SupFT(5) 

8.58 

SupFT(6|5) 

1.66 

Sequential  Procedure 
LWZ 
BIC 

Number  of  Breaks  Selected 
0 
1 
3 

4,i 

<$2,1 

Parameter  Estimates  with  one 
=  .025  (.013)                       4,2 
=  .274  (.316)                      4,2 
=  1967  (1961-1973) 

Break 
=  .024  (.013) 
=  .739  (.125) 

4,i  =  -021  (.010) 
4,1  =  -488  (.220) 

f2 

Parameter  Estimates  with  Two 
4,2  = -130  (.029)       4,3  = -011  (-019) 
4,2  =  -115  (.206)       4,3  =  -633  (.223) 

=  1973  (1972-1974) 

=  1980  (1979-1981) 
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Figure  3:  A  particular  configuration  of  (7\,  T2,  T3)  in  the  set  T£(C)  defined  in  the  proof 
of  Proposition  3 


T, 


Si 


6a 


*? 


°2 


6° 

°3 


% 


To 


7^0 
■L2 


TO 

J3 


2554 


/ 

)  1 1 


Date  Due 


gift  2  8 1390 


Lib-26-67 


